Sequence in BioPython module
Prerequisite: BioPython module
Sequence is basically a special series of letters which is used to represent the protein of an organism, DNA or RNA. Sequences in Biopython are usually handled by the Seq object described in Bio.Seq module. The Seq object has inbuilt functions like complement, reverse_complement, transcribe, back_transcribe and translate, etc. The Seq objects has numerous string methods like count(), find(), split(), strip(), etc.
Below are some examples of sequence in Biopython:
In the above example, the sequence GACT, each letter represents Glycine, Alanine, Cysteine and Threonine. Each Seq object has two important attributes:
- Data, which is the actual sequence string(GACT in this case).
- Alphabet, which is used to represent the type of the sequence i.e. DNA sequence, RNA sequence, etc. It is generic in nature and by default does not represent any sequence.
Here, the sequence ACGT, each letter represents Adenine, Cytosine, Guanine, and Thymine. The =TT refers various protein naming conventions and functionalities.
In addition to the string properties, Seq object also posses alphabet properties, these properties are instances of Alphabet class from Bio.Alphabet module, example IUPAC DNA or generic DNA describes the type of molecule i.e DNA, RNA, protein or it may also indicate expected symbols.
The Alphabet module provides the following classes to represent various sequences:
Class Property SingleLetterAlphabet Generic alphabet with letters of size one,derives from alphabet and all other alphabet types are derived from this. ProteinAlphabet Generic single letter protein alphabet NucleotideAlphabet Generic single letter nucleotide alphabet DNAAlphabet Generic single letter DNA alphabet. RNAAlphabet Generic single letter RNA alphabet. SecondaryStructure Alphabet used to describe secondary structure. ThreeLetterProtein Three letter protein alphabet. AlphabetEncoder class used to construct a new and extended alphabet from an existing one. Gapped Alphabets which contain a gap character. HasStopCodon Alphabets which contain a stop symbol.
Bio.Alphabet also provides an IUPAC module which gives sequence types as defined by the IUPAC community. Some classes in IUPAC module are listed below:
Name Class Property IUPACProtein Protein IUPAC protein alphabet of 20 standard amino acids. ExtendedIUPACProtein extended_protein Extended uppercase IUPAC protein single letter alphabet . IUPACAmbiguousDNA ambiguous_dna Uppercase IUPAC ambiguous DNA. IUPACUnambiguousDNA unambiguous_dna Uppercase IUPAC unambiguous DNA (GATC). ExtendedIUPACDNA extended_dna Extended IUPAC DNA alphabet. IUPACAmbiguousRNA ambiguous_rna Uppercase IUPAC ambiguous RNA. IUPACUnambiguousRNA unambiguous_rna Uppercase IUPAC unambiguous RNA (GAUC).
The, Bio.Alphabet was deleted from Biopython. The intended function of the alphabet objects has never been well established, and there have been disadvantages to the pre-existing 20-year-old style. In particular, the AlphabetEncoder class was excessively complex, making it difficult to decide the type of molecule. The consensus of several alphabet objects (e.g. during string addition) was often difficult.
Without a concrete plan for how to strengthen or replace the current structure, it was decided to completely abolish Bio.Aplphabet module.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course