Compare sequences in Python using dfflib module
Last Updated :
24 Feb, 2021
The dfflib Python module includes various features to evaluate the comparison of sequences, it can be used to compare files, and it can create information about file variations in different formats, including HTML and context and unified diffs.
It contains various classes to perform various comparisons between sequences:
Class SequenceMatcher
It is a very flexible class for matching sequence pairs of any sort. This class contains various functions discussed below:
- The ratio() method of this class returns the similarity ratio between the two arguments passed. The similarity ratio is determined using the formula below.
2*X/Y
Where X is the number of similar matches and
Y is the total elements present in both the sequences.
Example 1:
Python3
import difflib
par1 = [ 'g' , 'f' , 'g' ]
par2 = 'gfg'
print (difflib.SequenceMatcher( None , par1, par2).ratio())
|
Output:
1.0
Example 2:
Python3
import difflib
par1 = 'Geeks for geeks!'
par2 = 'geeks'
print (difflib.SequenceMatcher( None , par1, par2).ratio())
|
Output:
0.47619047619047616
Example 3:
Python3
import difflib
par1 = 'gfg'
par2 = 'GFG'
print (difflib.SequenceMatcher( None , par1, par2).ratio())
|
Output:
0.0
- The get_matching_blocks() method of this class returns a list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n].
Example 1:
Python3
import difflib
par1 = 'Geeks for geeks!'
par2 = 'geeks'
matches = difflib.SequenceMatcher(
None , par1, par2).get_matching_blocks()
for ele in matches:
print (par1[ele.a:ele.a + ele.size])
|
Output:
geeks
Example 2:
Python3
import difflib
par1 = 'GFG'
par2 = 'gfg'
matches = difflib.SequenceMatcher(
None , par1, par2).get_matching_blocks()
for ele in matches:
print (par1[ele.a:ele.a + ele.size])
|
Output:
As there are no matching subsequences between GFG and gfg. So no output is displayed.
- get_close_matches() method: This method returns the best character or group of character matches column. The term is a sequence in which close similarities are needed (usually a string) and possibilities are a set of sequences for matching terms (mostly a list of strings).
Example :
Python3
import difflib
string = "Geeks4geeks"
listOfStrings = [ "for" , "Gks" , "G4g" , "geeks" ]
print (difflib.get_close_matches(string, listOfStrings))
|
Output:
['geeks']
Class Differ
This class is used for matching sequences in the form of lines of text and creating human-readable variations or deltas. Every line of the Differ delta starts with a two-letter code:
Code |
Meaning |
‘- ‘ |
line unique to sequence 1 |
‘+ ‘ |
line unique to sequence 2 |
‘ ‘ |
line common to both sequences |
‘? ‘ |
line not present in either input sequence |
Following are the functions contained within this class:
- The compare() method in this class, compares two sequences of lines, and generate the delta (a sequence of lines).
Example 1:
Python3
from difflib import Differ
par1 = 'Geeks'
par2 = 'geeks!'
for ele in Differ().compare(par1, par2):
print (ele)
|
Output:
- G
+ g
e
e
k
s
+ !
Example 2:
Python3
from difflib import Differ
par1 = [ 'Geeks' , 'for' , 'geeks!' ]
par2 = 'geeks!'
for ele in Differ().compare(par1, par2):
print (ele)
|
Output:
- G
+ g
e
e
k
s
+ !
- ndiff() method: The above type of comparison can be performed using this method also. However, if lists are passed then the elements of the lists are compared first.
Example 1:
Python3
import difflib
par1 = 'Geeks'
par2 = 'geeks!'
for ele in difflib.ndiff(par1, par2):
print (ele)
|
Output:
- G
+ g
e
e
k
s
+ !
Example 2:
Python3
import difflib
par1 = [ 'Geeks' , 'for' , 'geeks!' ]
par2 = 'geeks!'
for ele in difflib.ndiff(par1, par2):
print (ele)
|
Output:
- Geeks
- for
- geeks!
+ g
+ e
+ e
+ k
+ s
+ !
- context_diff() method: The Context diffs are a convenient way to display only the lines that have shifted, with a few lines of context. The improvements are seen in the style before/after. The number of background lines is set to n, which is set to three by default.
Example 1:
Python3
import difflib
par1 = 'Geeks'
par2 = 'geeks!'
for ele in difflib.context_diff(par1, par2):
print (ele)
|
Output:
***
—
***************
*** 1,5 ****
! G
e
e
k
s
— 1,6 —-
! g
e
e
k
s
+ !
Example 2:
Python3
import difflib
par1 = [ 'Geeks' , 'for' , 'geeks!' ]
par2 = 'geeks!'
for ele in difflib.context_diff(par1, par2):
print (ele)
|
Output:
***
—
***************
*** 1,3 ****
! Geeks
! for
! geeks!
— 1,6 —-
! g
! e
! e
! k
! s
! !
Share your thoughts in the comments
Please Login to comment...