Set difference of two dataframe in pandas is carried out in roundabout way using drop_duplicates and concat function. It will become clear when we explain it with an example.
Set difference of two dataframe in pandas Python:
Set difference of two dataframes in pandas can be achieved in roundabout way using drop_duplicates and concat function. Let’s see with an example. First let’s create two data frames.
import pandas as pd import numpy as np #Create a DataFrame df1 = { 'Subject':['semester1','semester2','semester3','semester4','semester1', 'semester2','semester3'], 'Score':[62,47,55,74,31,77,85]} df2 = { 'Subject':['semester1','semester2','semester3','semester4'], 'Score':[90,47,85,12]} df1 = pd.DataFrame(df1,columns=['Subject','Score']) df2 = pd.DataFrame(df2,columns=['Subject','Score']) print(df1) print(df2)df1 will be
df2 will be
Set Difference of two dataframes in pandas python:
concat() function along with drop duplicates in pandas can be used to create the set difference of two dataframe as shown below.
To find the difference between two DataFrame, you need to check for its equality. Also, check the equality of columns.
Let us create DataFrame1 with two columns −
dataFrame1 = pd.DataFrame( { "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'], "Units": [100, 150, 110, 80, 110, 90] } )Create DataFrame2 with two columns −
dataFrame2 = pd.DataFrame( { "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'], "Units": [100, 150, 110, 80, 110, 90] } )Check for the equality of a specific column “Units” −
dataFrame2['Units'].equals(dataFrame1['Units'])Check for equality of both the DataFrames −
Are both the DataFrames equal?",dataFrame1.equals(dataFrame2)Example
Following is the code −
import pandas as pd # Create DataFrame1 dataFrame1 = pd.DataFrame( { "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'], "Units": [100, 150, 110, 80, 110, 90] } ) print"DataFrame1 ...\n",dataFrame1 # Create DataFrame2 dataFrame2 = pd.DataFrame( { "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'], "Units": [100, 150, 110, 80, 110, 90] } ) print"\nDataFrame2 ...\n",dataFrame2 # check for specific column Units equality print"\nBoth the DataFrames have similar Units column? ",dataFrame2['Units'].equals(dataFrame1['Units']) # check for equality print"\nAre both the DataFrames equal? ",dataFrame1.equals(dataFrame2)Output
This will produce the following output −
DataFrame1 ... Car Units 0 BMW 100 1 Lexus 150 2 Audi 110 3 Mustang 80 4 Bentley 110 5 Jaguar 90 DataFrame2 ... Car Units 0 BMW 100 1 Lexus 150 2 Audi 110 3 Mustang 80 4 Bentley 110 5 Jaguar 90 Both the DataFrames have similar Units column? True Are both the DataFrames equal? TrueWhile working with dataframes, many a times we have two dataframes and there is a need to find difference i.e. find the complement set of A intersection B. Such problems can be easily handled by concat fuction.
So this recipe is a short example on how to find difference between two dataframes. Let's get started.
Step 1 - Import the library
import pandas as pd
Let's pause and look at these imports. Pandas is generally used for data manipulation and analysis.
Step 2 - Setup the Data
df1= pd.DataFrame({'Student': ['Ram','Rohan','Shyam','Mohan'], 'Grade': ['A','C','B','Ex']}) df2 = pd.DataFrame({'Student': ['Ram','Shyam',], 'Grade': ['A','B']})
Let us create a two simple dataset of Student and grades.
Step 3 - Finding Difference
df3=pd.concat([df1,df2]).drop_duplicates(keep=False)
Concat function in pandas library help us in performing addition operation over dataframes. Here we are initially combining dataframes df1 and df2 and using drop_duplicates function, dropping out the intersection elements of the dataframes; hence taking the net difference.
data = [['dom', 10], ['chibuge', 15], ['celeste', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
data1 = [['dom', 11], ['abhi', 17], ['celeste', 14]]
df1 = pd.DataFrame(data1, columns = ['Name', 'Age'])
print("Dataframe 1 -- \n")
print("Dataframe 2 -- \n")
print("Dataframe difference -- \n")
print("Dataframe difference keeping equal values -- \n")
print(df.compare(df1, keep_equal=True))
print("Dataframe difference keeping same shape -- \n")
print(df.compare(df1, keep_shape=True))
print("Dataframe difference keeping same shape and equal values -- \n")
print(df.compare(df1, keep_shape=True, keep_equal=True))