Open In App

Merge two Pandas DataFrames on certain columns

Improve
Improve
Like Article
Like
Save
Share
Report

In data analysis, combining Pandas DataFrames is made easy with the merge function. You can streamline this process by pointing out which columns to use. Using a simple syntax, merging becomes a handy tool for efficiently working with data in various situations. This article walks you through the basic steps of merging Pandas DataFrames, providing a quick guide to boost your data processing skills.

Merge Two DataFrames on Certain Column Syntax

Syntax: DataFrame.merge(right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, copy=True, indicator=False, validate=None)

Parameters: right (DataFrame), how (str), on (label or list), left_on (label or list), right_on (label or list), left_index (bool), right_index (bool), sort (bool), copy (bool), indicator (bool or str), validate (str or None)

Return: DataFrame

Merge Two DataFrames based on a Common Column

There is various way to Merge two DataFrames based on a common column, here we are using some generally used methods for merging two DataFrames based on a common column those are following.

  1. Inner Join Merge
  2. Left Join Merge
  3. Right Join Merge
  4. Outer Join Merge
  5. Merge Dataframe Concatenation
  6. Column Subset Merge

Creating a Dataframe

In this example the code uses the pandas library to create two DataFrames (`df1` and `df2`) in Python. `df1` has columns ‘Name’ and ‘Marks’, while `df2` has columns ‘Name’, ‘Grade’, ‘Rank’, and ‘Gender’. The DataFrames are then displayed.

Python3




# importing modules
import pandas as pd
 
# creating a dataframe
df1 = pd.DataFrame({'Name':['Raju', 'Rani', 'Geeta', 'Sita', 'Sohit'],
                    'Marks':[80, 90, 75, 88, 59]})
 
# creating another dataframe with different data
df2 = pd.DataFrame({'Name':['Raju', 'Divya', 'Geeta', 'Sita'],
                    'Grade':['A', 'A', 'B', 'A'],
                    'Rank':[3, 1, 4, 2 ],
                    'Gender':['Male', 'Female', 'Female', 'Female']})
# display df1
display(df1)
 
# display df2
display(df2)


Output:

Name    Marks
0 Raju 80
1 Rani 90
2 Geeta 75
3 Sita 88
4 Sohit 59

Name Grade Rank Gender
0 Raju A 3 Male
1 Divya A 1 Female
2 Geeta B 4 Female
3 Sita A 2 Female

Merge Two Dataframe using Inner Join Merge

The method `merge` is used to combine two DataFrames with an inner join, matching rows based on a specified column, creating a new DataFrame with shared values.

In this example code performs a merge operation on DataFrames `df1` and `df2` using the ‘Name’ column as the key, resulting in a new DataFrame containing columns ‘Name’, ‘Marks’, ‘Grade’, and ‘Rank’ with only the common values present in both DataFrames.

Python3




# applying merge
df_merged = df1.merge(df2[['Name', 'Grade', 'Rank']])
print(df_merged)


Output:

 Name  Marks Grade  Rank
0 Raju 80 A 3
1 Geeta 75 B 4
2 Sita 88 A 2

Merge Two Pandas DataFrames on a Specific Column using Left Join Merge

The method merges two pandas DataFrames using a left join, combining rows based on a common column and retaining all rows from the left DataFrame while matching rows from the right DataFrame.

In the resultant dataframe Grade column of df2 is merged with df1 based on key column Name with merge type left i.e. all the values of left dataframe (df1) will be displayed. 

Python3




# applying merge with more parameters
df_merged = df1.merge(df2[['Grade', 'Name']], on = 'Name', how = 'left')
print(df_merged)


Output:

  Name  Marks Grade
0 Raju 80 A
1 Rani 90 NaN
2 Geeta 75 B
3 Sita 88 A
4 Sohit 59 NaN

Merge Two Pandas DataFrames using Right Join Merge

That method includes all the rows from the right dataframe and the matched rows from the left dataframe. If there is no match, NaN values are filled in for columns from the left dataframe.

In this example code merges two DataFrames, `df1` and `df2`, based on the ‘Name’ column using a right join and assigns the result to `df_merged`.

Python3




df_merged = df1.merge(df2, on='Name', how='right')
print(df_merged)


Output :

 Name  Marks Grade  Rank  Gender
0 Raju 80.0 A 3 Male
1 Divya NaN A 1 Female
2 Geeta 75.0 B 4 Female
3 Sita 88.0 A 2 Female

Merge Two Pandas DataFrames using Outer Join Merge

That method includes all the rows from both dataframes. If there is no match for a particular row in one of the dataframes, NaN values are filled in for the columns from the dataframe without a match.

In this example code performs an outer join on DataFrames `df1` and `df2` based on the ‘Name’ column, creating a new DataFrame called `df_merged` with combined data, including all rows from both DataFrames.

Python3




df_merged = df1.merge(df2, on='Name', how='outer')
print(df_merged)


Output :

 Name  Marks Grade  Rank  Gender
0 Raju 80.0 A 3.0 Male
1 Rani 90.0 NaN NaN NaN
2 Geeta 75.0 B 4.0 Female
3 Sita 88.0 A 2.0 Female
4 Sohit 59.0 NaN NaN NaN
5 Divya NaN A 1.0 Female

Python Pandas Merge only Certain Columns using Concatenation

Concatenation is a method for combining two dataframes along a particular axis (either rows or columns). It doesn’t require a common column for merging; rather, it stacks the dataframes on top of each other or side by side.

In this example code concatenates two pandas DataFrames (`df1` and `df2`) horizontally (side by side) along the columns, creating a new DataFrame named `df_concatenated`.

Python3




df_concatenated = pd.concat([df1, df2], axis=1)
print(df_concatenated)


Output :

Name  Marks   Name Grade  Rank  Gender
0 Raju 80 Raju A 3.0 Male
1 Rani 90 Divya A 1.0 Female
2 Geeta 75 Geeta B 4.0 Female
3 Sita 88 Sita A 2.0 Female
4 Sohit 59 NaN NaN NaN NaN

Column Subset Merge in Pandas DataFrames

This method merges two dataframes in pandas by selecting a specific subset of columns from one dataframe and combining it with another based on a common column, resulting in a merged dataframe with the chosen columns from both datasets.

In this example, we have merged df1 with df2. The Marks column of df1 is merged with df2 and only the common values based on key column Name in both the dataframes are displayed here.

Python3




# applying merge with more parameters
df_merged = df2.merge(df1[['Marks', 'Name']])
print(df_merged)


Output:

Name Grade  Rank  Gender  Marks
0 Raju A 3 Male 80
1 Geeta B 4 Female 75
2 Sita A 2 Female 88


Last Updated : 18 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads