Skip to content
Related Articles

Related Articles

Improve Article

Sequence in BioPython module

  • Last Updated : 11 Oct, 2020

Prerequisite: BioPython module

Sequence is basically a special series of letters which is used to represent the protein of an organism, DNA or RNA. Sequences in Biopython are usually handled by the Seq object described in Bio.Seq module. The Seq object has inbuilt functions like complement, reverse_complement, transcribe, back_transcribe and translate, etc. The Seq objects has numerous string methods like count(), find(), split(), strip(), etc.

Below are some examples of sequence in Biopython:

Example 1:


# Import libraries
from Bio.Seq import Seq
# Creating a sequence
seq = Seq("GACT")
# Printing Sequence



In the above example, the sequence GACT, each letter represents Glycine, Alanine, Cysteine and Threonine. Each Seq object has two important attributes:

  1. Data, which is the actual sequence string(GACT in this case).
  2. Alphabet, which is used to represent the type of the sequence i.e. DNA sequence, RNA sequence, etc. It is generic in nature and by default does not represent any sequence.

Example 2:


# Import libraries
from Bio.Seq import Seq
# Creating a sequence
seq = Seq("ACGT=TT")
# Updating sequence
updatedSeq = my_dna.ungap("=")
# Printing Sequence



Here, the sequence ACGT, each letter represents Adenine, Cytosine, Guanine, and Thymine. The =TT refers various protein naming conventions and functionalities.

Alphabet Class:

In addition to the string properties, Seq object also posses alphabet properties, these properties are instances of Alphabet class from Bio.Alphabet module, example IUPAC DNA or generic DNA describes the type of molecule i.e DNA, RNA, protein or it may also indicate expected symbols.

The Alphabet module provides the following classes to represent various sequences:

SingleLetterAlphabetGeneric alphabet with letters of size one,derives from alphabet and all other alphabet types are derived from this.
ProteinAlphabet Generic single letter protein alphabet
NucleotideAlphabetGeneric single letter nucleotide alphabet
DNAAlphabetGeneric single letter DNA alphabet.
RNAAlphabetGeneric single letter RNA alphabet.
SecondaryStructureAlphabet used to describe secondary structure.
ThreeLetterProteinThree letter protein alphabet.
AlphabetEncoder class used  to construct a new and extended alphabet from an existing one.
Gapped Alphabets which contain a gap character.
HasStopCodonAlphabets which contain a stop symbol.

Bio.Alphabet also provides an IUPAC module which gives sequence types as defined by the IUPAC community. Some classes in IUPAC module are listed below:

IUPACProteinProteinIUPAC protein alphabet of 20 standard amino acids.
ExtendedIUPACProtein extended_proteinExtended uppercase IUPAC protein single letter alphabet .
IUPACAmbiguousDNA  ambiguous_dnaUppercase IUPAC ambiguous DNA.
IUPACUnambiguousDNA unambiguous_dnaUppercase IUPAC unambiguous DNA (GATC).
ExtendedIUPACDNA extended_dnaExtended IUPAC DNA alphabet.
IUPACAmbiguousRNA ambiguous_rnaUppercase IUPAC ambiguous RNA.
IUPACUnambiguousRNA unambiguous_rna Uppercase IUPAC unambiguous RNA (GAUC).

The, Bio.Alphabet was deleted from Biopython. The intended function of the alphabet objects has never been well established, and there have been disadvantages to the pre-existing 20-year-old style. In particular, the AlphabetEncoder class was excessively complex, making it difficult to decide the type of molecule. The consensus of several alphabet objects (e.g. during string addition) was often difficult.

Without a concrete plan for how to strengthen or replace the current structure, it was decided to completely abolish Bio.Aplphabet module.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :