Joining of Dataframes in R Programming

In R Language, dataframes are generic data objects which are used to store the tabular data. Dataframes are considered to be the most popular data objects in R programming because it is more comfortable to analyze the data in the tabular form. Dataframes can also be taught as mattresses where each column of a matrix can be of the different data types. Dataframe is made up of three principal components, the data, rows, and columns.

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS. Types of Merging Available in R are,

  1. Natural Join or Inner Join
  2. Left Outer Join
  3. Right Outer Join
  4. Full Outer Join
  5. Cross Join
  6. Semi Join
  7. Anti Join

Basic Syntax of merge() function in R:

Syntax:
merge(df1, df2, by.df1, by.df2, all.df1, all.df2, sort = TRUE)

Parameters:
df1: one dataframe
df2: another dataframe
by.df1, by.df2: The names of the columns that are common to both df1 and df2.
all, all.df1, all.df2: Logical values that actually specify the type of merging happens.



Now let’s try to understand all types of merging one by one. First of all, we will create two dataframes that will help us to understand each join easily.
# Data frame 1

filter_none

edit
close

play_arrow

link
brightness_4
code

df1 = data.frame(StudentId = c(101:106), 
                 Product = c("Hindi", "English"
                             "Maths", "Science"
                             "Political Science"
                             "Physics"))
df1 

chevron_right


Output:

    StudentId          Product
1       101             Hindi
2       102           English
3       103             Maths
4       104           Science
5       105 Political Science
6       106           Physics

# Data frame 2

filter_none

edit
close

play_arrow

link
brightness_4
code

df2 = data.frame(StudentId = c(102, 104, 106,
                               107, 108), 
                 State = c("Manglore", "Mysore",
                           "Pune", "Dehradun", "Delhi")) 
df2 

chevron_right


Output:

   StudentId   State
1       102 Manglore
2       104   Mysore
3       106     Pune
4       107 Dehradun
5       108    Delhi

Natural Join or Inner Join

Inner join is used to keep only those rows that are matched from the dataframes, in this, we actually specify the argument all = FALSE. If we try to understand this using set theory then we can say here we are actually performing the intersection operation. For example:

A = [1, 2, 3, 4, 5]
B = [2, 3, 5, 6]
Then the output of natural join will be (2, 3, 5)

It is the most simplest and common type of joins available in R. Now let us try to understand this using R program:

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate
# Joining of dataframes
  
df = merge(x = df1, y = df2, by = "StudentId")
df

chevron_right


Output:



   StudentId Product    State
1       102 English Manglore
2       104 Science   Mysore
3       106 Physics     Pune

Left Outer Join

Left Outer Join is basically to include all the rows of your dataframe x and only those from y that match, in this, we actually specify the argument x = TRUE. If we try to understand this using a basic set theory then we can say here we are actually displaying complete set x. Now let us try to understand this using R program:
Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate
# Joining of dataframes
  
df = merge(x = df1, y = df2, by = "StudentId",
                                 all.x = TRUE)
df

chevron_right


Output:

  StudentId           Product    State
1       101             Hindi     NA
2       102           English   Manglore
3       103             Maths     NA
4       104           Science   Mysore
5       105 Political Science     NA
6       106           Physics     Pune

Right Outer Join

Right, Outer Join is basically to include all the rows of your dataframe y and only those from x that match, in this, we actually specify the argument y = TRUE. If we try to understand this using a basic set theory then we can say here we are actually displaying a complete set y. Now let us try to understand this using R program:
Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate
# Joining of dataframes
  
df = merge(x = df1, y = df2, by = "StudentId",
                                 all.y = TRUE)
df

chevron_right


Output:

  StudentId Product    State
1       102 English Manglore
2       104 Science   Mysore
3       106 Physics     Pune
4       107    NA    Dehradun
5       108    NA    Delhi

Full Outer Join

Outer Join is basically used to keep all rows from both dataframes, in this, we actually specify the arguments all = TRUE. If we try to understand this using a basic set theory then we can say here we are actually performing the union option. Now let us try to understand this using R program:
Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate
# Joining of dataframes
  
df = merge(x = df1, y = df2, by = "StudentId",
                                   all = TRUE)
df

chevron_right


Output:

   StudentId           Product    State
1       101             Hindi     NA
2       102           English   Manglore
3       103             Maths     NA
4       104           Science   Mysore
5       105 Political Science     NA
6       106           Physics     Pune
7       107              NA     Dehradun
8       108              NA     Delhi

Cross Join

A Cross Join also known as cartesian join results in every row of one dataframe is being joined to every other row of another dataframe. In set theory, this type of joins is known as the cartesian product between two sets. Now let us try to understand this using R program:
Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate
# Joining of dataframes
  
df = merge(x = df1, y = df2, by = NULL)
df

chevron_right


Output:

StudentId.x           Product StudentId.y    State
1          101             Hindi         102 Manglore
2          102           English         102 Manglore
3          103             Maths         102 Manglore
4          104           Science         102 Manglore
5          105 Political Science         102 Manglore
6          106           Physics         102 Manglore
7          101             Hindi         104   Mysore
8          102           English         104   Mysore
9          103             Maths         104   Mysore
10         104           Science         104   Mysore
11         105 Political Science         104   Mysore
12         106           Physics         104   Mysore
13         101             Hindi         106     Pune
14         102           English         106     Pune
15         103             Maths         106     Pune
16         104           Science         106     Pune
17         105 Political Science         106     Pune
18         106           Physics         106     Pune
19         101             Hindi         107 Dehradun
20         102           English         107 Dehradun
21         103             Maths         107 Dehradun
22         104           Science         107 Dehradun
23         105 Political Science         107 Dehradun
24         106           Physics         107 Dehradun
25         101             Hindi         108    Delhi
26         102           English         108    Delhi
27         103             Maths         108    Delhi
28         104           Science         108    Delhi
29         105 Political Science         108    Delhi
30         106           Physics         108    Delhi

Semi Join

This join is somewhat like inner join, with only the left dataframe columns and values are selected. Now let us try to understand this using R program:
Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate
# Joining of dataframes
  
# Import required library
library(dplyr)
  
df = df1 %>% semi_join(df2, by = "StudentId")
df

chevron_right


Output:

  StudentId Product
1       102 English
2       104 Science
3       106 Physics

Anti Join

In terms of set theory, we can say anti-join as set difference operation, for example, A = (1, 2, 3, 4) B = (2, 3, 5) then the output of A-B will be set (1, 4). This join is somewhat like df1 – df2, as it basically selects all rows from df1 that are actually not present in df2. Now let us try to understand this using R program:
Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate
# Joining of dataframes
  
# Import required library
library(dplyr)
  
df = df1 %>% anti_join(df2, by = "StudentId")
df

chevron_right


Output:

  StudentId           Product
1       101             Hindi
2       103             Maths
3       105 Political Science



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.