Open In App

Biopython – Sequence Alignment

Improve
Improve
Like Article
Like
Save
Share
Report

Sequence alignment is a process in which two or more DNA, RNA or Protein sequences are arranged in order specifically to identify the region of similarity among them. Identification of similar provides a lot of information about what traits are conserved among species, how much close are different species genetically, how species evolve, etc. Biopython has a wide range of functionalities for sequence alignment.

Reading Sequence Alignment: Bio.AlignIo provided by Biopython is used to read and write sequence alignments. There are a lot of formats available in bioinformatics to specify sequence alignment data is similar to sequence data. Bio.AlignIO has an API similar to Bio.SeqIO, the only difference is that the Bio.SeqIO works on sequence data while the Bio.AlignIO works on sequence data alignment. Below are some steps to download a sample sequence alignment file :

  • Now choose any family having a lesser number of seed value, as it contains minimum data and easy to work. Let’s move one with PF18225 (http://pfam.xfam.org/family/PF18225).

  • Click on the alignment section and download the required sequence alignment file in Stockholm format.

Example:

Python3




# Import libraries
from Bio import AlignIO
  
# Creating Sequence Alignment
alignment = AlignIO.read(open("PF18225_seed.txt"), "stockholm")
  
# Print alignment object
print(alignment)
  
# Show alignment sequence record
print("Showing Alignment Sequence Record")
for align in alignment:
     print(align.seq)


Output:

SingleLetterAlphabet() alignment with 5 rows and 65 columns
AINRNTQQLTQDLRAMPNWSLRFVYIVDRNNQDLLKRPLPPGIM…NRK B3PFT7_CELJU/62-126
AVNATEREFTERIRTLPHWARRNVFVLDSQGFEIFDRELPSPVA…NRT K4KEM7_SIMAS/61-125
MQNTPAERLPAIIEKAKSKHDINVWLLDRQGRDLLEQRVPAKVA…EGP B7RZ31_9GAMM/59-123
ARRHGQEYFQQWLERQPKKVKEQVFAVDQFGRELLGRPLPEDMA…KKP A0A143HL37_9GAMM/57-121
TRRHGPESFRFWLERQPVEARDRIYAIDRSGAEILDRPIPRGMA…NKP A0A0X3UC67_9GAMM/57-121

Showing Alignment Sequence Record
AINRNTQQLTQDLRAMPNWSLRFVYIVDRNNQDLLKRPLPPGIMVLAPRLTAKHPYDKVQDRNRK
AVNATEREFTERIRTLPHWARRNVFVLDSQGFEIFDRELPSPVADLMRKLDLDRPFKKLERKNRT
MQNTPAERLPAIIEKAKSKHDINVWLLDRQGRDLLEQRVPAKVATVANQLRGRKRRAFARHREGP
ARRHGQEYFQQWLERQPKKVKEQVFAVDQFGRELLGRPLPEDMAPMLIALNYRNRESHAQVDKKP
TRRHGPESFRFWLERQPVEARDRIYAIDRSGAEILDRPIPRGMAPLFKVLSFRNREDQGLVNNKP

Reading Multiple Alignments: Generally, most sequence alignment files contain single alignment data, where the read() method is enough to parse it. In the case of multiple sequence alignments, more than two sequences are compared for the best sequence match among them and the result in a single file having multiple sequence alignment. If the sequence alignment format has more than one sequence alignment, then the parse() method is used instead of read() which returns an iterable object which can be iterated to get the actual alignments. A basic example is given below :

Python3




# Import libraries
from Bio import AlignIO 
  
# Parsing Sequence Alignment
alignment = AlignIO.parse(open("PF18225_seed.txt"), "stockholm")
  
# Show alignment generator
print(alignment)
  
# Printing alignment 
for alignment in alignments: 
    print(alignment)


Output:

<generator object parse at 0x00000214C9FDB990>

SingleLetterAlphabet() alignment with 5 rows and 65 columns
AINRNTQQLTQDLRAMPNWSLRFVYIVDRNNQDLLKRPLPPGIM…NRK B3PFT7_CELJU/62-126
AVNATEREFTERIRTLPHWARRNVFVLDSQGFEIFDRELPSPVA…NRT K4KEM7_SIMAS/61-125
MQNTPAERLPAIIEKAKSKHDINVWLLDRQGRDLLEQRVPAKVA…EGP B7RZ31_9GAMM/59-123
ARRHGQEYFQQWLERQPKKVKEQVFAVDQFGRELLGRPLPEDMA…KKP A0A143HL37_9GAMM/57-121
TRRHGPESFRFWLERQPVEARDRIYAIDRSGAEILDRPIPRGMA…NKP A0A0X3UC67_9GAMM/57-121



Last Updated : 11 Oct, 2020
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads