Compare sequences in Python using dfflib module

The dfflib Python module includes various features to evaluate the comparison of sequences, it can be used to compare files, and it can create information about file variations in different formats, including HTML and context and unified diffs.

It contains various classes to perform various comparisons between sequences:

Class SequenceMatcher

It is a very flexible class for matching sequence pairs of any sort. This class contains various functions discussed below:

The ratio() method of this class returns the similarity ratio between the two arguments passed. The similarity ratio is determined using the formula below.

2*X/Y

Where X is the number of similar matches and

Y is the total elements present in both the sequences.

Example 1:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = ['g', 'f', 'g'] 

par2 = 'gfg'

# compare 

print(difflib.SequenceMatcher(None, par1, par2).ratio())

Output:

1.0

Example 2:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = 'Geeks for geeks!'

par2 = 'geeks'

# compare 

print(difflib.SequenceMatcher(None, par1, par2).ratio())

Output:

0.47619047619047616

Example 3:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = 'gfg'

par2 = 'GFG'

# compare 

print(difflib.SequenceMatcher(None, par1, par2).ratio())

Output:

0.0

The get_matching_blocks() method of this class returns a list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n].

Example 1:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = 'Geeks for geeks!'

par2 = 'geeks'

# compare 

matches = difflib.SequenceMatcher( 

    None, par1, par2).get_matching_blocks() 

for ele in matches: 

    print(par1[ele.a:ele.a + ele.size])

Output:

geeks

Example 2:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = 'GFG'

par2 = 'gfg'

# compare 

matches = difflib.SequenceMatcher( 

    None, par1, par2).get_matching_blocks() 

for ele in matches: 

    print(par1[ele.a:ele.a + ele.size])

Output:

As there are no matching subsequences between GFG and gfg. So no output is displayed.

get_close_matches() method: This method returns the best character or group of character matches column. The term is a sequence in which close similarities are needed (usually a string) and possibilities are a set of sequences for matching terms (mostly a list of strings).

Example :

Python3

# import required module 

import difflib 

# assign parameters 

string = "Geeks4geeks"

listOfStrings = ["for", "Gks", "G4g", "geeks"] 

# find common strings 

print(difflib.get_close_matches(string, listOfStrings))

Output:

['geeks']

Class Differ

This class is used for matching sequences in the form of lines of text and creating human-readable variations or deltas. Every line of the Differ delta starts with a two-letter code:

Code	Meaning
‘- ‘	line unique to sequence 1
‘+ ‘	line unique to sequence 2
‘ ‘	line common to both sequences
‘? ‘	line not present in either input sequence

Following are the functions contained within this class:

The compare() method in this class, compares two sequences of lines, and generate the delta (a sequence of lines).

Example 1:

Python3

# import required module 

from difflib import Differ 

# assign parameters 

par1 = 'Geeks'

par2 = 'geeks!'

# compare parameters 

for ele in Differ().compare(par1, par2): 

    print(ele)

Output:

- G
+ g
  e
  e
  k
  s
+ !

Example 2:

Python3

# import required module 

from difflib import Differ 

# assign parameters 

par1 = ['Geeks','for','geeks!'] 

par2 = 'geeks!'

# compare parameters 

for ele in Differ().compare(par1, par2): 

    print(ele)

Output:

- G
+ g
  e
  e
  k
  s
+ !

ndiff() method: The above type of comparison can be performed using this method also. However, if lists are passed then the elements of the lists are compared first.

Example 1:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = 'Geeks'

par2 = 'geeks!'

# compare parameters 

for ele in difflib.ndiff(par1, par2): 

    print(ele)

Output:

- G
+ g
  e
  e
  k
  s
+ !

Example 2:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = ['Geeks','for','geeks!'] 

par2 = 'geeks!'

# compare parameters 

for ele in difflib.ndiff(par1, par2): 

    print(ele)

Output:

- Geeks
- for
- geeks!
+ g
+ e
+ e
+ k
+ s
+ !

context_diff() method: The Context diffs are a convenient way to display only the lines that have shifted, with a few lines of context. The improvements are seen in the style before/after. The number of background lines is set to n, which is set to three by default.

Example 1:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = 'Geeks'

par2 = 'geeks!'

# compare parameters 

for ele in difflib.context_diff(par1, par2): 

    print(ele)

Output:

***

—

***************

*** 1,5 ****

! G

e

e

k

s

— 1,6 —-

! g

e

e

k

s

+ !

Example 2:

Python3

# import required module 

import difflib 

# assign parameters 

par1 = ['Geeks', 'for', 'geeks!'] 

par2 = 'geeks!'

# compare parameters 

for ele in difflib.context_diff(par1, par2): 

    print(ele)

Output:

***

—

***************

*** 1,3 ****

! Geeks

! for

! geeks!

— 1,6 —-

! g

! e

! e

! k

! s

! !

Article Tags :

Python

Technical Scripter

python-modules

Technical Scripter 2020