Open In App

Compare sequences in Python using dfflib module

Last Updated : 24 Feb, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

The dfflib Python module includes various features to evaluate the comparison of sequences, it can be used to compare files, and it can create information about file variations in different formats, including HTML and context and unified diffs.

It contains various classes to perform various comparisons between sequences:

Class SequenceMatcher

It is a very flexible class for matching sequence pairs of any sort. This class contains various functions discussed below:

  • The ratio() method of this class returns the similarity ratio between the two arguments passed. The similarity ratio is determined using the formula below.

2*X/Y 

Where X is the number of similar matches and 

Y is the total elements present in both the sequences.

Example 1:

Python3




# import required module
import difflib
  
# assign parameters
par1 = ['g', 'f', 'g']
par2 = 'gfg'
  
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())


Output:

1.0

Example 2:

Python3




# import required module
import difflib
  
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
  
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())


Output:

0.47619047619047616

Example 3:

Python3




# import required module
import difflib
  
# assign parameters
par1 = 'gfg'
par2 = 'GFG'
  
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())


Output:

0.0
  • The get_matching_blocks() method of this class returns a list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n].

Example 1:

Python3




# import required module
import difflib
  
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
  
# compare
matches = difflib.SequenceMatcher(
    None, par1, par2).get_matching_blocks()
  
for ele in matches:
    print(par1[ele.a:ele.a + ele.size])


Output:

geeks

Example 2:

Python3




# import required module
import difflib
  
# assign parameters
par1 = 'GFG'
par2 = 'gfg'
  
# compare
matches = difflib.SequenceMatcher(
    None, par1, par2).get_matching_blocks()
  
for ele in matches:
    print(par1[ele.a:ele.a + ele.size])


Output:

 

As there are no matching subsequences between GFG and gfg. So no output is displayed.

  • get_close_matches() method: This method returns the best character or group of character matches column. The term is a sequence in which close similarities are needed (usually a string) and possibilities are a set of sequences for matching terms (mostly a list of strings).

Example :

Python3




# import required module
import difflib
  
# assign parameters
string = "Geeks4geeks"
listOfStrings = ["for", "Gks", "G4g", "geeks"]
  
# find common strings
print(difflib.get_close_matches(string, listOfStrings))


Output:

['geeks']

Class Differ

This class is used for matching sequences in the form of lines of text and creating human-readable variations or deltas. Every line of the Differ delta starts with a two-letter code:

Code Meaning
‘- ‘ line unique to sequence 1
‘+ ‘ line unique to sequence 2
‘  ‘ line common to both sequences
‘? ‘ line not present in either input sequence

Following are the functions contained within this class:

  • The compare() method in this class, compares two sequences of lines, and generate the delta (a sequence of lines).

Example 1:

Python3




# import required module
from difflib import Differ
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in Differ().compare(par1, par2):
    print(ele)


Output:

- G
+ g
  e
  e
  k
  s
+ !

Example 2:

Python3




# import required module
from difflib import Differ
  
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in Differ().compare(par1, par2):
    print(ele)


Output:

- G
+ g
  e
  e
  k
  s
+ !
  • ndiff() method: The above type of comparison can be performed using this method also. However, if lists are passed then the elements of the lists are compared first.

Example 1:

Python3




# import required module
import difflib
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.ndiff(par1, par2):
    print(ele)


Output:

- G
+ g
  e
  e
  k
  s
+ !

Example 2:

Python3




# import required module
import difflib
  
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.ndiff(par1, par2):
    print(ele)


Output:

- Geeks
- for
- geeks!
+ g
+ e
+ e
+ k
+ s
+ !
  • context_diff() method: The Context diffs are a convenient way to display only the lines that have shifted, with a few lines of context. The improvements are seen in the style before/after. The number of background lines is set to n, which is set to three by default.

Example 1:

Python3




# import required module
import difflib
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.context_diff(par1, par2):
    print(ele)


Output:

*** 

— 

***************

*** 1,5 ****

! G

  e

  e

  k

  s

— 1,6 —-

! g

  e

  e

  k

  s

+ !

Example 2:

Python3




# import required module
import difflib
  
# assign parameters
par1 = ['Geeks', 'for', 'geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.context_diff(par1, par2):
    print(ele)


Output:

*** 

— 

***************

*** 1,3 ****

! Geeks

! for

! geeks!

— 1,6 —-

! g

! e

! e

! k

! s

! !



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads