Open In App

Biopython – Entrez Database Connection

Improve
Improve
Like Article
Like
Save
Share
Report

The NCBI provides an online search system named Entrez. This provides access to a wide range of databases of the molecular biology and it also provides an integrated global query system which supports the boolean operators and the field search. The results are returned from all databases containing information like number of hits, links to originating database, etc from each database.

Biopython has an Entrez specific module named Bio.Entrez for this purpose.The Entrez module purses the information from the XML file returned by the Entrez search system and displays it as python dictionary and lists. Steps to connect the database are listed below :

Approach

  • Import the required modules.
  • Set email to identify who is connected.
  • Set the Entrez tool parameter, it is Biopython by default.
  • Call the einfo() method to get information about each database.
  • Read the information provided by the einfo() method.
  • The data so obtained is in XML format, so to get this data in python object read() method is used
  • Now the record is in a dictionary format having only one key.
  • By accessing the DbList key, a list of database is returned.

The resulting program should look like the code given below :

Python3




# Import libraries
from Bio import Entrez
  
# Setting email
Entrez.email = 'jeetesh1@yopmail.com'
  
# Setting Entrez tool parameter
Entrez.tool = 'Demoscript'
  
# Gathering information
info = Entrez.einfo()
  
# Reading Info as XML
#data = info.read()
  
# Parsing info as python object
record = Entrez.read(info)
  
# Getting record key
record.keys()
  
# Parsing records
record[u'DbList']


Output:


Last Updated : 03 Jan, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads