In this tutorial, we will learn about Python pandas compare two Dataframes. If you ever worked on large data sets for any use case, you must have come across many operations that needs to be performed on the data set like, analyzing, cleaning, modifying the data and much more. One of the important and very common operation is comparing two Dataframes. We will learn about various method to compare two Dataframes in this tutorial . So let us begin.
What is Dataframe in Pandas ?
In Python, Pandas is a library which is used for working with data sets. It provides a large set of methods that makes the data analysis, filtering, manipulation etc. much easier. Dataframe is a data structure provided by Pandas library. It is two dimensional, size mutable and potentially heterogeneous tabular data structure. It is similar to spreadsheet or a SQL table or a dictionary of Series objects.
Each column in a Dataframe can be of a different data type like Integers, strings, float etc. Dataframe is powerful for data manipulation, data cleaning, data analysis and data visualization tasks.
Python Pandas Compare Two Dataframes [Solved]
Also read: How to Find Sum of Elements in List Python [5 Best Examples]
1. Using ‘equals’ Method
In Pandas, ‘equals()‘ method is used to determine if two pandas objects (Dataframe or Series) are equal. It compares the values within the objects and returns a Boolean value indicating whether they are same or not. Below is the syntax for equals() method.
Syntax
df1.equals(df2)
Let us now write a code to understand the implementation of this method. We have created two Dataframes, df1 and df2. We use equals() method to compare these two Dataframe. If all the elements in two Dataframe are equal, it will return True else returns False as shown below.
import pandas as pd df1 = pd.DataFrame( { 'k1': ["Cat", "Lion"], 'k2': ["Monkey", "Parrot"] } ) df2 = pd.DataFrame( { 'k1': ["Cat", "Lion"], 'k2': ["Monkey", "Parrot"] } ) #Compare df1 and df2 using equals Method print(df1.equals(df2))
True
df2 = pd.DataFrame( { 'k1': ["Cat", "Lion"], 'k2': ["Monkey", "Deer"] } )
False
2. Using ‘compare’ Method
In Pandas, ‘‘compare()’ method is used to compare two Dataframe objects and highlight the differences. It creates a Dataframe with the compared results, indicating where the values are equal or different. Below is the syntax for compare() method.
Syntax
df1.compare(df2, align_axis=0, keep_shape=False)
NOTE:
In the below example, we will use same Dataframes as we used in previous example. Next, we will use compare() method to compare the two Dataframes. It will return a new Dataframe with the differences between Dataframe df1 and df2 as shown below.
import pandas as pd df1 = pd.DataFrame( { 'k1': ["Cat", "Lion"], 'k2': ["Monkey", "Parrot"] } ) df2 = pd.DataFrame( { 'k1': ["Cat", "Tiger"], 'k2': ["Monkey", "Peacock"] } ) #Compare df1 and df2 using compare Method df_compared = df1.compare(df2) print(df_compared)
k1 k2 self other self other 1 Lion Tiger Parrot Peacock
Summary
We have learnt about Dataframes comparison using Pandas built-in methods. There are numerous methods supported by Pandas library for data sets. You can learn more about Pandas from pandas.pydata.org