Open In App

How to compare two text files in python?

Python has provided the methods to manipulate files that too in a very concise manner. In this article we are going to discuss one of the applications of the Python’s file handling features i.e. the comparison of files.

Files in use:



Method 1: Comparing complete file at once

Python supports a module called filecmp with a method filecmp.cmp() that returns three list containing matched files, mismatched files and errors regarding those files which could not be compared. This method can operate in two modes :

Syntax:



cmp(a, b)

Parameters:

a and b are the two numbers in which the comparison is being done. 

Returns:

  • -1 if a<b
  • 0 if a=b
  • 1 if a>b

Program:




import filecmp
 
f1 = "C:/Users/user/Documents/intro.txt"
f2 = "C:/Users/user/Desktop/intro1.txt"
 
# shallow comparison
result = filecmp.cmp(f1, f2)
print(result)
# deep comparison
result = filecmp.cmp(f1, f2, shallow=False)
print(result)

Output:

False

False

Method 2: Comparing files line by line

The drawback in the above approach is that we can not retrieve the lines where the files differ. Though this is an optional requirement we often want to watch out for the lines where files differ and then manipulate that to our advantage. The basic approach to implement this is to store each line of every file in separate lists one for each file. These lists are compared against each other two files at a time.

Approach:  

Program:




# reading files
f1 = open("C:/Users/user/Documents/intro.txt", "r"
f2 = open("C:/Users/user/Desktop/intro1.txt", "r"
 
f1_data = f1.readlines()
f2_data = f2.readlines()
 
i = 0
 
for line1 in f1_data:
    i += 1
     
    for line2 in f2_data:
         
        # matching line1 from both files
        if line1 == line2: 
            # print IDENTICAL if similar
            print("Line ", i, ": IDENTICAL")      
        else:
            print("Line ", i, ":")
            # else print that line from both files
            print("\tFile 1:", line1, end='')
            print("\tFile 2:", line2, end='')
        break
 
# closing files
f1.close()                                      
f2.close()                                     

Output: 

Method 3: Comparing complete directory

Python supports a module called filecmp with a method filecmp.cmpfiles() that returns three list containing matched files, mismatched files and errors regarding those files which could not be compared. It is similar to first approach but it is used to compare files in two different directories. 

Program:




import filecmp
 
d1 = "C:/Users/user/Documents/"
d2 = "C:/Users/user/Desktop/"
files = ['intro.txt']
 
# shallow comparison
match, mismatch, errors = filecmp.cmpfiles(d1, d2, files)
print('Shallow comparison')
print("Match:", match)
print("Mismatch:", mismatch)
print("Errors:", errors)
 
# deep comparison
match, mismatch, errors = filecmp.cmpfiles(d1, d2, files, shallow=False)
print('Deep comparison')
print("Match:", match)
print("Mismatch:", mismatch)
print("Errors:", errors)

Output:

Shallow Comparison

Match: [ ]

Mismatch: [ ‘ intro.txt ‘]

Errors: [ ]

Deep comparison

Match: []

Mismatch: [ ‘ intro.txt ‘]

Errors: [ ]


Article Tags :