Skip to content
Related Articles

Related Articles

Improve Article

Biopython – Sequence Operations

  • Last Updated : 06 Aug, 2021

The Biopython module provides various built-in methods through which we can perform various basic and advanced operations on the sequences. basic operations are very similar to string methods like slicing, concatenation, find, count, strip, split, etc. Some of the advanced operations are listed below 

Complement and Reverse Complement: Biopython provides the complement() and reverse_complement() functions which can be used to find the complement of the given nucleotide sequence to get a new sequence, while the complemented sequence can also be reverse complemented to get the original sequence. Below is a simple example for described functions:

Syntax: complement(self)

Return Type: <class ‘Bio.Seq.Seq’>


# Import Libraries
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
# Creating sequence
seq = Seq('CTGACTGAAGCT', IUPAC.ambiguous_dna)
# Creating complement of the sequence and print
comp = seq.complement()
# Creating reverse complement and print
rev_comp = comp.reverse_complement()



In the above example, the complement() method creates the complement of the DNA or RNA sequence, while the reverse_complement() function creates the complement of the sequence and reverses the resultant from left to right.

Bio.Data.IUPACData module of biopython provides the ambiguous_dna_complement variable which is used to perform the complement operations. 


# Import libraries
from Bio.Data import IUPACData
import pprint
# Printing the dataset


   'A': 'T',
   'B': 'V',
   'C': 'G',
   'D': 'H',
   'G': 'C',
   'H': 'D',
   'K': 'M',
   'M': 'K',
   'N': 'N',
   'R': 'Y',
   'S': 'S',
   'T': 'A',
   'V': 'B',
   'W': 'W',
   'X': 'X',
   'Y': 'R'} 

GC Content(guanine-cytosine content): GC Content is basically the percentage of nitrogenous bases in DNA or RNA molecule which is either Guanine or Cytosine. It can be predicted by calculating the number of GC nucleotides divided by the total number of nucleotides. Below is a basic example for calculating GC content: 

Syntax: Bio.SeqUtils.GC(seq) 

Return Type: <class ‘float’> 


# Import Libraries
from Bio.Seq import Seq
from Bio.SeqUtils import GC
from Bio.Alphabet import IUPAC
# Creating sequence
seq = Seq("CTGACTGAAGCT", IUPAC.unambiguous_dna)
# Getting GC count



Transcription: It is basically a process of converting a DNA into a RNA sequence. An actual biological transcription is a process to perform a reverse complement(GACT -> AGUC) to get the mRNA having DNA as the template strand. In Biopython, the base DNA strand is directly converted to mRNA simply by changing the letter T with U. A simple example is given below :

Syntax: transcribe(self)

Return Type: <class ‘Bio.Seq.Seq’> 


# Import Libraries
from Bio.Seq import Seq
from Bio.Seq import transcribe
from Bio.Alphabet import IUPAC
# Creating sequence
dna_seq = Seq("CTGACTGAAGCT", IUPAC.unambiguous_dna)
# Transcription to RNA
# Reverse Transcription to DNA
rna_seq = transcribe(dna_seq)



Translation: It is a process of translating a RNA sequence to a protein sequence. The sequence module has h built-in translate() method used for this purpose. If we have to stop translation at the first codon, it is possible by passing to_stop = True parameter to the translation() method.

Biopython uses the translation table provided by The Genetic Codes page of NCBI. The full list of translation table is given below :

Syntax: translate(self, table=’Standard’, stop_symbol=’*’, to_stop=False, cds=False, gap=’-‘)
Return Type: <class ‘Bio.Seq.Seq’> 


# import libraries
from Bio.Data import CodonTable
# Creating table
table = CodonTable.unambiguous_dna_by_name["Standard"]
# Print table


Table 1 Standard, SGC0

  |  T      |  C      |  A      |  G      |
T | TTT F   | TCT S   | TAT Y   | TGT C   | T
T | TTC F   | TCC S   | TAC Y   | TGC C   | C
T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G
C | CTT L   | CCT P   | CAT H   | CGT R   | T
C | CTC L   | CCC P   | CAC H   | CGC R   | C
C | CTA L   | CCA P   | CAA Q   | CGA R   | A
C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G
A | ATT I   | ACT T   | AAT N   | AGT S   | T
A | ATC I   | ACC T   | AAC N   | AGC S   | C
A | ATA I   | ACA T   | AAA K   | AGA R   | A
A | ATG M(s)| ACG T   | AAG K   | AGG R   | G
G | GTT V   | GCT A   | GAT D   | GGT G   | T
G | GTC V   | GCC A   | GAC D   | GGC G   | C
G | GTA V   | GCA A   | GAA E   | GGA G   | A
G | GTG V   | GCG A   | GAG E   | GGG G   | G

A simple example of translation is given below :


# Import Libraries
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
# Creating sequence
# Translating RNA
# Stop translation to first stop codon ( asterisk '*' is stop codon)
print(rna.translate(to_stop = True))


Seq('YRIVFPG*SCAR', HasStopCodon(IUPACProtein(), '*'))
Seq('YRIVFPG', IUPACProtein())


 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :