How do I merge two datasets based on a column in R?

In this article, we are going to see how to merge Dataframe by Column Names using merge in R Programming Language.

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

Syntax: merge(x, y, by, all)

Arguments :

  • x, y – The input dataframes
  • by – specifications of the columns used for merging. In case of merging using row names, the by attribute uses ‘row.names’ value.
  • all – logical true or false.

Example 1: Merge two dataframe by columns

Two dataframes can be merged together using the common columns, in both the dataframes. The column to use for merging can be specified in the “by” parameter during the function call. The output dataframe produces the rows equivalent to the common entries encountered in the columns specified in the “by” argument. 

R




# creating first dataframe

df1 <-data.frame(col1 =LETTERS[1:6],

                  col2a =c(5:10),

                  df1 <-1df1 <-2df1 <-3

df1 <-

df1 <-5df1 <-6df1 <-7df1 <-3

df1 <-5data.frame0

data.frame1data.frame(col1 =LETTERSdata.frame5

                  data.frame7cdata.frame9

                  (col1 =1(col1 =2df1 <-3

df1 <-

df1 <-5df1 <-6(col1 =7df1 <-3

df1 <-5LETTERS0

LETTERS1LETTERS2LETTERS3LETTERS4df1 <-3

df1 <-

df1 <-5df1 <-6LETTERS9df1 <-3

df1 <-5[1:6],2

Output:

How do I merge two datasets based on a column in R?

Example 2: Merge dataframe with missing values

In order to retain all the values of the first dataframe, irrespective of whether they have common values or not in the “by” parameter, we set all.x = true. The missing values belonging to the second dataframe are appended with a NA value. 

R




# creating first dataframe

df1 <-data.frame(col1 =LETTERS[1:6],

                  col2a =c(5:10),

                  df1 <-1df1 <-2df1 <-3

df1 <-

df1 <-5df1 <-6df1 <-7df1 <-3

df1 <-5data.frame0

data.frame1data.frame(col1 =LETTERSdata.frame5

                  data.frame7cdata.frame9

                  (col1 =1(col1 =2df1 <-3

df1 <-

df1 <-5df1 <-6(col1 =7df1 <-3

df1 <-5LETTERS0

LETTERS1LETTERS2(5:10),6LETTERS4(5:10),8

                  df1 <-00df1 <-2df1 <-3

df1 <-

df1 <-5df1 <-6LETTERS9df1 <-3

df1 <-5[1:6],2

Output:

How do I merge two datasets based on a column in R?

In order to retain all the values of the second dataframe, irrespective of whether they have common values or not in the “by” parameter, we set all.y = true. The missing values belonging to the first dataframe columns are appended with a NA value. 

R




# creating first dataframe

df1 <-data.frame(col1 =LETTERS[1:6],

                  col2a =c(5:10),

                  df1 <-1df1 <-2df1 <-3

df1 <-

df1 <-5df1 <-6df1 <-7df1 <-3

df1 <-5data.frame0

data.frame1data.frame(col1 =LETTERSdata.frame5

                  data.frame7cdata.frame9

                  (col1 =1(col1 =2df1 <-3

df1 <-

df1 <-5df1 <-6(col1 =7df1 <-3

df1 <-5LETTERS0

LETTERS1LETTERS2(5:10),6LETTERS4df1 <-55

                  df1 <-57df1 <-2df1 <-3

df1 <-

df1 <-5df1 <-6LETTERS9df1 <-3

df1 <-5[1:6],2

Output:

How do I merge two datasets based on a column in R?

The following code illustrates the usage when all the rows of both the input dataframes need to be retained. 

R




# creating first dataframe

df1 <-data.frame(col1 =LETTERS[1:6],

                  col2a =c(5:10),

                  df1 <-1df1 <-2df1 <-3

df1 <-5df1 <-6df1 <-7df1 <-3

df1 <-5data.frame0

data.frame1data.frame(col1 =LETTERSdata.frame5

                  data.frame7cdata.frame9

                  (col1 =1(col1 =2df1 <-3

df1 <-5df1 <-6(col1 =7df1 <-3

df1 <-5LETTERS0

LETTERS1LETTERS2LETTERS3LETTERS4data.frame10df1 <-2data.frame12df1 <-2df1 <-3

df1 <-

df1 <-5df1 <-6LETTERS9df1 <-3

df1 <-5[1:6],2

Output:

How do I merge two datasets based on a column in R?

Example 3: Merge more than two dataframes

More than two dataframes can also be merged. However, the dataframes are merged using the merge() method call, two at a time in the order of their appearance in the function call. Therefore, if n dataframes are to be merged, n-1 function calls are required. 

How do I merge two Dataframes based on a column in R?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

How do you merge datasets based on two columns?

Example 1: Combine Data by Two ID Columns Using merge() Function. In Example 1, I'll illustrate how to apply the merge function to combine data frames based on multiple ID columns. For this, we have to specify the by argument of the merge function to be equal to a vector of ID column names (i.e. by = c(“ID1”, “ID2”)).

How do I join two Dataframes based on two columns?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.

How do I combine two sets of data in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.