How do you find the difference between two DataFrames in pandas?

Set difference of two dataframe in pandas is carried out in roundabout way using drop_duplicates and concat function.   It will become clear when we explain it with an example.

Set difference of two dataframe in pandas Python:

Set difference of two dataframes in pandas can be achieved in roundabout way using drop_duplicates and concat function. Let’s see with an example. First let’s create two data frames.

import pandas as pd
import numpy as np

#Create a DataFrame
df1 = {
    'Subject':['semester1','semester2','semester3','semester4','semester1',
               'semester2','semester3'],
   'Score':[62,47,55,74,31,77,85]}

df2 = {
    'Subject':['semester1','semester2','semester3','semester4'],
   'Score':[90,47,85,12]}


df1 = pd.DataFrame(df1,columns=['Subject','Score'])
df2 = pd.DataFrame(df2,columns=['Subject','Score'])

print(df1)
print(df2)

df1 will be

df2 will be

 

Set Difference of two dataframes in pandas python:

concat() function along with drop duplicates in pandas can be used to create the set difference  of two dataframe as shown below.

To find the difference between two DataFrame, you need to check for its equality. Also, check the equality of columns.

Let us create DataFrame1 with two columns −

dataFrame1 = pd.DataFrame(
   {
      "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
      "Units": [100, 150, 110, 80, 110, 90] }
)

Create DataFrame2 with two columns −

dataFrame2 = pd.DataFrame(
   {
      "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
      "Units": [100, 150, 110, 80, 110, 90]
   }
)

Check for the equality of a specific column “Units” −

dataFrame2['Units'].equals(dataFrame1['Units'])

Check for equality of both the DataFrames −

Are both the DataFrames equal?",dataFrame1.equals(dataFrame2)

Example

Following is the code −

import pandas as pd

# Create DataFrame1
dataFrame1 = pd.DataFrame(
   {
      "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
      "Units": [100, 150, 110, 80, 110, 90] }
)

print"DataFrame1 ...\n",dataFrame1

# Create DataFrame2
dataFrame2 = pd.DataFrame(
   {
      "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
      "Units": [100, 150, 110, 80, 110, 90]
   }
)

print"\nDataFrame2 ...\n",dataFrame2


# check for specific column Units equality
print"\nBoth the DataFrames have similar Units column? ",dataFrame2['Units'].equals(dataFrame1['Units'])

# check for equality
print"\nAre both the DataFrames equal? ",dataFrame1.equals(dataFrame2)

Output

This will produce the following output −

DataFrame1 ...
       Car   Units
0      BMW     100
1    Lexus     150
2     Audi     110
3  Mustang      80
4  Bentley     110
5   Jaguar      90

DataFrame2 ...
       Car   Units
0      BMW     100
1    Lexus     150
2     Audi     110
3  Mustang      80
4  Bentley     110
5   Jaguar      90

Both the DataFrames have similar Units column? True

Are both the DataFrames equal? True

While working with dataframes, many a times we have two dataframes and there is a need to find difference i.e. find the complement set of A intersection B. Such problems can be easily handled by concat fuction.

So this recipe is a short example on how to find difference between two dataframes. Let's get started.

Step 1 - Import the library

import pandas as pd

Let's pause and look at these imports. Pandas is generally used for data manipulation and analysis.

Step 2 - Setup the Data

df1= pd.DataFrame({'Student': ['Ram','Rohan','Shyam','Mohan'], 'Grade': ['A','C','B','Ex']}) df2 = pd.DataFrame({'Student': ['Ram','Shyam',], 'Grade': ['A','B']})

Let us create a two simple dataset of Student and grades.

Step 3 - Finding Difference

df3=pd.concat([df1,df2]).drop_duplicates(keep=False)

Concat function in pandas library help us in performing addition operation over dataframes. Here we are initially combining dataframes df1 and df2 and using drop_duplicates function, dropping out the intersection elements of the dataframes; hence taking the net difference.

data = [['dom', 10], ['chibuge', 15], ['celeste', 14]]

df = pd.DataFrame(data, columns = ['Name', 'Age'])

data1 = [['dom', 11], ['abhi', 17], ['celeste', 14]]

df1 = pd.DataFrame(data1, columns = ['Name', 'Age'])

print("Dataframe 1 -- \n")

print("Dataframe 2 -- \n")

print("Dataframe difference -- \n")

print("Dataframe difference keeping equal values -- \n")

print(df.compare(df1, keep_equal=True))

print("Dataframe difference keeping same shape -- \n")

print(df.compare(df1, keep_shape=True))

print("Dataframe difference keeping same shape and equal values -- \n")

print(df.compare(df1, keep_shape=True, keep_equal=True))

How to find differences between two DataFrames in pandas?

By using equals() function we can directly check if df1 is equal to df2. This function is used to determine if two dataframe objects in consideration are equal or not.

Can I subtract two DataFrames pandas?

subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.

How do you compare two DataFrames and find the difference?

compare() Method. This method is used to compare two DataFrames and to find the differences between the rows of two DataFrames. It returns the particular column where it finds the difference. Here, a DataFrame will call this function and another DataFrames will be passed as parameter.

Can I compare two DataFrames?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.