Biopython – Entrez Database Search Operation
The NCBI provides an online search system named Entrez. This provides access to a wide range of databases of the molecular biology and it also provides an integrated global query system which supports the boolean operators and the field search. The results are returned from all databases containing information like number of hits, links to originating database, etc from each database.
Biopython Entrez comes equipped with 2 methods to perform search operation on databases:
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course
- Biopython has an Entrez specific method named esearch() to search any one of the Entrez databases. It accepts to positional parameters database and the term which we have to search. If wrong database is assigned then it will raise an error.
- To search any query across all the databases, egquery() method is used. It is similar to the Entrez.esearch() methods except it only takes the term parameter skipping the database parameter.
- Import the required modules.
- Set your email to identify who is connected with the database.
- Set the Entrez tool parameter, it is Biopython by default.
- Use any of the methods provided above with appropriate parameters.
- The data returned will be in XML format, so to get this data in python object Entrez.read() method is used to read the object
- Read the information provided.
Implementation using both methods is given below:
Example 1: Using esearch()
Example 2: Using egquery()