The dfflib Python module includes various features to evaluate the comparison of sequences, it can be used to compare files, and it can create information about file variations in different formats, including HTML and context and unified diffs.
It contains various classes to perform various comparisons between sequences:
Class SequenceMatcher
It is a very flexible class for matching sequence pairs of any sort. This class contains various functions discussed below:
- The ratio() method of this class returns the similarity ratio between the two arguments passed. The similarity ratio is determined using the formula below.
2*X/Y
Where X is the number of similar matches and
Y is the total elements present in both the sequences.
Example 1:
# import required module import difflib
# assign parameters par1 = [ 'g' , 'f' , 'g' ]
par2 = 'gfg'
# compare print (difflib.SequenceMatcher( None , par1, par2).ratio())
|
Output:
1.0
Example 2:
# import required module import difflib
# assign parameters par1 = 'Geeks for geeks!'
par2 = 'geeks'
# compare print (difflib.SequenceMatcher( None , par1, par2).ratio())
|
Output:
0.47619047619047616
Example 3:
# import required module import difflib
# assign parameters par1 = 'gfg'
par2 = 'GFG'
# compare print (difflib.SequenceMatcher( None , par1, par2).ratio())
|
Output:
0.0
- The get_matching_blocks() method of this class returns a list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n].
Example 1:
# import required module import difflib
# assign parameters par1 = 'Geeks for geeks!'
par2 = 'geeks'
# compare matches = difflib.SequenceMatcher(
None , par1, par2).get_matching_blocks()
for ele in matches:
print (par1[ele.a:ele.a + ele.size])
|
Output:
geeks
Example 2:
# import required module import difflib
# assign parameters par1 = 'GFG'
par2 = 'gfg'
# compare matches = difflib.SequenceMatcher(
None , par1, par2).get_matching_blocks()
for ele in matches:
print (par1[ele.a:ele.a + ele.size])
|
Output:
As there are no matching subsequences between GFG and gfg. So no output is displayed.
- get_close_matches() method: This method returns the best character or group of character matches column. The term is a sequence in which close similarities are needed (usually a string) and possibilities are a set of sequences for matching terms (mostly a list of strings).
Example :
# import required module import difflib
# assign parameters string = "Geeks4geeks"
listOfStrings = [ "for" , "Gks" , "G4g" , "geeks" ]
# find common strings print (difflib.get_close_matches(string, listOfStrings))
|
Output:
['geeks']
Class Differ
This class is used for matching sequences in the form of lines of text and creating human-readable variations or deltas. Every line of the Differ delta starts with a two-letter code:
Code | Meaning |
---|---|
‘- ‘ | line unique to sequence 1 |
‘+ ‘ | line unique to sequence 2 |
‘ ‘ | line common to both sequences |
‘? ‘ | line not present in either input sequence |
Following are the functions contained within this class:
- The compare() method in this class, compares two sequences of lines, and generate the delta (a sequence of lines).
Example 1:
# import required module from difflib import Differ
# assign parameters par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters for ele in Differ().compare(par1, par2):
print (ele)
|
Output:
- G + g e e k s + !
Example 2:
# import required module from difflib import Differ
# assign parameters par1 = [ 'Geeks' , 'for' , 'geeks!' ]
par2 = 'geeks!'
# compare parameters for ele in Differ().compare(par1, par2):
print (ele)
|
Output:
- G + g e e k s + !
- ndiff() method: The above type of comparison can be performed using this method also. However, if lists are passed then the elements of the lists are compared first.
Example 1:
# import required module import difflib
# assign parameters par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters for ele in difflib.ndiff(par1, par2):
print (ele)
|
Output:
- G + g e e k s + !
Example 2:
# import required module import difflib
# assign parameters par1 = [ 'Geeks' , 'for' , 'geeks!' ]
par2 = 'geeks!'
# compare parameters for ele in difflib.ndiff(par1, par2):
print (ele)
|
Output:
- Geeks - for - geeks! + g + e + e + k + s + !
- context_diff() method: The Context diffs are a convenient way to display only the lines that have shifted, with a few lines of context. The improvements are seen in the style before/after. The number of background lines is set to n, which is set to three by default.
Example 1:
# import required module import difflib
# assign parameters par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters for ele in difflib.context_diff(par1, par2):
print (ele)
|
Output:
***
—
***************
*** 1,5 ****
! G
e
e
k
s
— 1,6 —-
! g
e
e
k
s
+ !
Example 2:
# import required module import difflib
# assign parameters par1 = [ 'Geeks' , 'for' , 'geeks!' ]
par2 = 'geeks!'
# compare parameters for ele in difflib.context_diff(par1, par2):
print (ele)
|
Output:
***
—
***************
*** 1,3 ****
! Geeks
! for
! geeks!
— 1,6 —-
! g
! e
! e
! k
! s
! !