It’s well known that Python is a multi-paradigm, general-purpose language that is widely used for data analytics because of its extensive library support and an active community. The most commonly known methods to compare two Pandas dataframes using python are:
- Using difflib
- Using fuzzywuzzy
- Regex Match
These methods are widely in use by seasoned and new developers but what if we require a report to find all of the matching/mismatching columns & rows? Here’s when the DataComPy library comes into the picture.
DataComPy is a Pandas library open-sourced by capitalone. It was started with an aim to replace PROC COMPARE for Pandas data frames. It takes two dataframes as input and gives us a human-readable report containing statistics that lets us know the similarities and dissimilarities between the two dataframes.
Install via pip3:
pip3 install datacompy
- In the above example, we are joining the two data frames on a matching column. We can also pass:
on_index = Trueinstead of “join_columns” to join on the index instead.
Compare.matches()is a Boolean function. It returns True if there’s a match, else it returns False.
- DataComPy by default returns True only if there’s a 100% match. We can tweak this by setting the values of abs_tol & rel_tol to non-zero, which empowers us to specify an amount of deviation between numeric values that can be tolerated. They stand for absolute tolerance and relative tolerance respectively.
- We can see from the above example that DataComPy is a really powerful library & it is extremely helpful in cases when we have to generate a comparison report of 2 dataframes.
- How to compare values in two Pandas Dataframes?
- How to Union Pandas DataFrames using Concat?
- How to Join Pandas DataFrames using Merge?
- Joining two Pandas DataFrames using merge()
- Reshaping Pandas Dataframes using Melt And Unmelt
- How To Add Identifier Column When Concatenating Pandas dataframes?
- Split large Pandas Dataframe into list of smaller Dataframes
- Python | Merge, Join and Concatenate DataFrames using Panda
- List of Dataframes in R
- Compare two files using Hasing in Python
- Python | Decimal compare() method
- Python | sympy.compare() method
- Python | Compare tuples
- Compare file system in Windows and Linux
- Python - Compare Unordered Dictionary List
- Python - Compare Dictionaries on certain Keys
- How to compare two NumPy arrays?
- Python | pandas.to_markdown() in Pandas
- Add a Pandas series to another Pandas series
- How to Remove repetitive characters from words of the given Pandas DataFrame using Regex?
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.