Open In App

Compare Two Csv Files Using Python

Last Updated : 20 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

We are given two files and our tasks is to compare two CSV files based on their differences in Python. In this article, we will see some generally used methods for comparing two CSV files and print differences.

Compare Two CSV Files for Differences in Python

Below are some of the ways by which we can compare two CSV files for differences in Python:

file1.csv

Name,Age,City
John,25,New York
Emily,30,Los Angeles
Michael,40,Chicago

file2.csv

Name,Age,City
John,25,New York
Michael,45,Chicago
Emma,35,San Francisco

Compare Two CSV Files Using Pandas library

In this approach, the Python Program loads both the CSV files (‘file1.csv’ & ‘file2.csv’) into two DataFrames. Once the CSV files are loaded, the compare() method provided by Pandas allows us to efficiently identify differences between the two DataFrames by comparing each corresponding row between the two DataFrames.

Python3
import pandas as pd

# Read CSV files
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

# Compare dataframes
diff = df1.compare(df2)

# Print the differences
print("Differences between file1 and file2:")
print(diff)

Output

Differences between file1 and file2:
      Name            Age               City
      self    other  self other         self          other
1    Emily  Michael  30.0  45.0  Los Angeles        Chicago
2  Michael     Emma  40.0  35.0      Chicago  San Francisco

Compare Two CSV Files Using CSV Module

In this approach, the Python Program reads both the CSV files (‘file1.csv’ & ‘file2.csv’) using csv.reader function in reading mode. Then iterate over the rows of both CSV files and compare them.

Python3
import csv

# Function to compare two CSV files
def compare(file1, file2):
    differences = []

    # Open both CSV files in read mode
    with open(file1, 'r') as csv_file1, open(file2, 'r') as csv_file2:
        reader1 = csv.reader(csv_file1)
        reader2 = csv.reader(csv_file2)

        # Iterate over rows in both files simultaneously
        for row1, row2 in zip(reader1, reader2):
            if row1 != row2:
                differences.append((row1, row2))

    return differences

# Define file paths
file1 = 'file1.csv'
file2 = 'file2.csv'

# Call the compare_csv_files function and store the differences
differences = compare(file1, file2)
for diff in differences:
    print(f"Difference found: {diff}")

Output

Difference found: (['Emily', '30', 'Los Angeles'], ['Michael', '45', 'Chicago'])
Difference found: (['Michael', '40', 'Chicago'], ['Emma', '35', 'San Francisco'])

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads