Open In App

dplyr Full Join in R

Last Updated : 17 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

R’s dplyr package offers a suite of functions for data manipulation, including various types of joins. Among these joins, the full join stands out as a powerful tool for merging datasets while retaining all rows from both datasets. This article aims to provide a comprehensive guide to using the full join function in R Programming Language along with multiple examples.

Purpose of using Full Join in R

A full join, also known as a full outer join, combines datasets by including all rows from both datasets, matching rows where possible, and filling in missing values with NA where no match is found. This type of join is useful when you want to retain all information from both datasets, even if there are no corresponding matches.

full_join(x, y, by = NULL)
  • x, y: The data frames to be joined.
  • by: Variables to join by. If not specified, the function will attempt to join all variables with common names.

Basic Full Join

Suppose we have two data frames, df1 and df2, containing information about employees and their departments. We want to combine these datasets while retaining all information from both datasets.

R
library(dplyr)
# Sample data frames
df1 <- data.frame(EmployeeID = c(1, 2, 3),
                  Name = c("Aaditya", "Boby", "mayank"),
                  Department = c("HR", "IT", "Finance"))
df1
df2 <- data.frame(EmployeeID = c(1, 2, 4),
                  Salary = c(50000, 60000, 55000))
df2
# Perform full join
full_join(df1, df2, by = "EmployeeID")

Output:

  EmployeeID    Name Department
1 1 Aaditya HR
2 2 Boby IT
3 3 mayank Finance

EmployeeID Salary
1 1 50000
2 2 60000
3 4 55000

EmployeeID Name Department Salary
1 1 Aaditya HR 50000
2 2 Boby IT 60000
3 3 mayank Finance NA
4 4 <NA> <NA> 55000

Full Join with Multiple Variables

Consider a scenario where we have two datasets, sales and expenses, containing information about sales and expenses by month and year. We want to merge these datasets based on both month and year, retaining all information.

R
library(dplyr)
# Sample data frames
sales <- data.frame(Month = c("Jan", "Feb", "Mar"),
                    Year = c(2023, 2023, 2023),
                    Sales = c(10000, 12000, 15000))
sales
expenses <- data.frame(Month = c("Jan", "Feb", "Apr"),
                       Year = c(2023, 2023, 2023),
                       Expenses = c(5000, 6000, 5500))
expenses
# Perform full join
full_join(sales, expenses, by = c("Month", "Year"))

Output:

  Month Year Sales
1 Jan 2023 10000
2 Feb 2023 12000
3 Mar 2023 15000

Month Year Expenses
1 Jan 2023 5000
2 Feb 2023 6000
3 Apr 2023 5500

Month Year Sales Expenses
1 Jan 2023 10000 5000
2 Feb 2023 12000 6000
3 Mar 2023 15000 NA
4 Apr 2023 NA 5500

Handling Missing Values with Full Join

In some cases, there may be missing values in the datasets being joined. Full join ensures that all rows from both datasets are retained, with NA values filled in where no match is found.

R
library(dplyr)

# Example data frames
df1 <- data.frame(ID = c(1, 2, 3),
                  Value1 = c(10, NA, 30))
df2 <- data.frame(ID = c(2, 3, 4),
                  Value2 = c(20, 30, 40))

# Full join to retain all rows from both data frames
merged_df <- full_join(df1, df2, by = "ID")
merged_df
# Fill missing values with zeros
merged_df_filled <- merged_df %>%
  mutate(across(.cols = everything(), .fns = function(x) ifelse(is.na(x), 0, x)))

print(merged_df_filled)

Output:

  ID Value1 Value2
1 1 10 NA
2 2 NA 20
3 3 30 30
4 4 NA 40
Fill missing values with zeros

ID Value1 Value2
1 1 10 0
2 2 0 20
3 3 30 30
4 4 0 40

Conclusion

The full join function in R’s dplyr package is a valuable tool for combining datasets while retaining all rows from both datasets, regardless of whether there is a matching key. By understanding the syntax and applications of full join, data analysts can effectively integrate data from different sources and perform comprehensive data analysis tasks. Incorporating full join into your data manipulation workflow enhances the flexibility and reliability of data merging operations in R.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads