Compare dataframes with different number of rows

Q: How do I combine two DataFrames with different rows and columns?

Lets merge the two data frames with different columns. It is possible to join the different columns is using concat() method..

df1=
Índice Show

How do you compare two DataFrames with different number of columns?
How do you compare rows of two DataFrames?
How do you compare values in two different DataFrames?
How do I combine two DataFrames with different rows and columns?

  A   B  C  D

  a1  b1 c1 1

  a2  b2 c2 2

  a3  b3 c3 4

df2=

  A   B  C  D

  a1  b1 c1 2

  a2  b2 c2 1

I want to compare the value of the column 'D' in both dataframes. If both dataframes had same number of rows I would just do this.

newDF = df1['D']-df2['D']

However there are times when the number of rows are different. I want a result Dataframe which shows a dataframe like this.

resultDF=

  A   B  C  D_df1 D_df2  Diff

  a1  b1 c1  1     2       -1

  a2  b2 c2  2     1        1

EDIT: if 1st row in A,B,C from df1 and df2 is same then and only then compare 1st row of column D for each dataframe. Similarly, repeat for all the row.

marc_s

715k172 gold badges1315 silver badges1434 bronze badges

asked Aug 12, 2019 at 23:52

Use merge and df.eval

df1.merge(df2, on=['A','B','C'], suffixes=['_df1','_df2']).eval('Diff=D_df1 - D_df2')

Out[314]:
    A   B   C  D_df1  D_df2  Diff
0  a1  b1  c1      1      2    -1
1  a2  b2  c2      2      1     1

answered Aug 13, 2019 at 0:13

Andy L.Andy L.

24.5k4 gold badges16 silver badges27 bronze badges

DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False, result_names=('self', 'other'))[source]#

Compare to another DataFrame and show the differences.

New in version 1.1.0.

Parameters otherDataFrame

Object to compare with.

align_axis{0 or ‘index’, 1 or ‘columns’}, default 1

Determine which axis to align the comparison on.

0, or ‘index’Resulting differences are stacked vertically
with rows drawn alternately from self and other.
1, or ‘columns’Resulting differences are aligned horizontally
with columns drawn alternately from self and other.

keep_shapebool, default False

If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.

keep_equalbool, default False

If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.

result_namestuple, default (‘self’, ‘other’)

Set the dataframes names in the comparison.

New in version 1.5.0.

ReturnsDataFrame

DataFrame that shows the differences stacked side by side.

The resulting index will be a MultiIndex with ‘self’ and ‘other’ stacked alternately at the inner level.

RaisesValueError

When the two DataFrames don’t have identical labels or shape.

Notes

Matching NaNs will not appear as a difference.

Can only compare identically-labeled (i.e. same shape, identical row and column labels) DataFrames

Examples

>>> df = pd.DataFrame(
...     {
...         "col1": ["a", "a", "b", "b", "a"],
...         "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
...     },
...     columns=["col1", "col2", "col3"],
... )
>>> df
  col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

>>> df2 = df.copy()
>>> df2.loc[0, 'col1'] = 'c'
>>> df2.loc[2, 'col3'] = 4.0
>>> df2
  col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

Align the differences on columns

>>> df.compare(df2)
  col1       col3
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

Assign result_names

>>> df.compare(df2, result_names=("left", "right"))
  col1       col3
  left right left right
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

Stack the differences on rows

>>> df.compare(df2, align_axis=0)
        col1  col3
0 self     a   NaN
  other    c   NaN
2 self   NaN   3.0
  other  NaN   4.0

Keep the equal values

>>> df.compare(df2, keep_equal=True)
  col1       col3
  self other self other
0    a     c  1.0   1.0
2    b     b  3.0   4.0

Keep all original rows and columns

>>> df.compare(df2, keep_shape=True)
  col1       col2       col3
  self other self other self other
0    a     c  NaN   NaN  NaN   NaN
1  NaN   NaN  NaN   NaN  NaN   NaN
2  NaN   NaN  NaN   NaN  3.0   4.0
3  NaN   NaN  NaN   NaN  NaN   NaN
4  NaN   NaN  NaN   NaN  NaN   NaN

Keep all original rows and columns and also all original values

>>> df.compare(df2, keep_shape=True, keep_equal=True)
  col1       col2       col3
  self other self other self other
0    a     c  1.0   1.0  1.0   1.0
1    a     a  2.0   2.0  2.0   2.0
2    b     b  3.0   3.0  3.0   4.0
3    b     b  NaN   NaN  4.0   4.0
4    a     a  5.0   5.0  5.0   5.0

How do you compare two DataFrames with different number of columns?

We can use the . eq method to quickly compare the dataframes. The output of . eq lists out each cell position and tells us whether the values in that cell position were equal between the two dataframes (note that rows 1 and 3 contain errors).

How do you compare rows of two DataFrames?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.