Open In App

Sequence in BioPython module

Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite: BioPython module

Sequence is basically a special series of letters which is used to represent the protein of an organism, DNA or RNA. Sequences in Biopython are usually handled by the Seq object described in Bio.Seq module. The Seq object has inbuilt functions like complement, reverse_complement, transcribe, back_transcribe and translate, etc. The Seq objects has numerous string methods like count(), find(), split(), strip(), etc.

Below are some examples of sequence in Biopython:

Example 1:

Python3




# Import libraries
from Bio.Seq import Seq
  
# Creating a sequence
seq = Seq("GACT")
  
# Printing Sequence
print(seq)


Output:

GACT

In the above example, the sequence GACT, each letter represents Glycine, Alanine, Cysteine and Threonine. Each Seq object has two important attributes:

  1. Data, which is the actual sequence string(GACT in this case).
  2. Alphabet, which is used to represent the type of the sequence i.e. DNA sequence, RNA sequence, etc. It is generic in nature and by default does not represent any sequence.

Example 2:

Python3




# Import libraries
from Bio.Seq import Seq
  
# Creating a sequence
seq = Seq("ACGT=TT")
  
# Updating sequence
updatedSeq = my_dna.ungap("=")
  
# Printing Sequence
print(updatedSeq)


Output:

ACGTT

Here, the sequence ACGT, each letter represents Adenine, Cytosine, Guanine, and Thymine. The =TT refers various protein naming conventions and functionalities.

Alphabet Class:

In addition to the string properties, Seq object also posses alphabet properties, these properties are instances of Alphabet class from Bio.Alphabet module, example IUPAC DNA or generic DNA describes the type of molecule i.e DNA, RNA, protein or it may also indicate expected symbols.

The Alphabet module provides the following classes to represent various sequences:

Class Property
SingleLetterAlphabet Generic alphabet with letters of size one,derives from alphabet and all other alphabet types are derived from this.
ProteinAlphabet  Generic single letter protein alphabet
NucleotideAlphabet Generic single letter nucleotide alphabet
DNAAlphabet Generic single letter DNA alphabet.
RNAAlphabet Generic single letter RNA alphabet.
SecondaryStructure Alphabet used to describe secondary structure.
ThreeLetterProtein Three letter protein alphabet.
AlphabetEncoder  class used  to construct a new and extended alphabet from an existing one.
Gapped  Alphabets which contain a gap character.
HasStopCodon Alphabets which contain a stop symbol.

Bio.Alphabet also provides an IUPAC module which gives sequence types as defined by the IUPAC community. Some classes in IUPAC module are listed below:

Name Class Property
IUPACProtein Protein IUPAC protein alphabet of 20 standard amino acids.
ExtendedIUPACProtein  extended_protein Extended uppercase IUPAC protein single letter alphabet .
IUPACAmbiguousDNA   ambiguous_dna Uppercase IUPAC ambiguous DNA.
IUPACUnambiguousDNA  unambiguous_dna Uppercase IUPAC unambiguous DNA (GATC).
ExtendedIUPACDNA  extended_dna Extended IUPAC DNA alphabet.
IUPACAmbiguousRNA  ambiguous_rna Uppercase IUPAC ambiguous RNA.
IUPACUnambiguousRNA  unambiguous_rna  Uppercase IUPAC unambiguous RNA (GAUC).

The, Bio.Alphabet was deleted from Biopython. The intended function of the alphabet objects has never been well established, and there have been disadvantages to the pre-existing 20-year-old style. In particular, the AlphabetEncoder class was excessively complex, making it difficult to decide the type of molecule. The consensus of several alphabet objects (e.g. during string addition) was often difficult.

Without a concrete plan for how to strengthen or replace the current structure, it was decided to completely abolish Bio.Aplphabet module.



Last Updated : 11 Oct, 2020
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads