Open In App

Python | Merge, Join and Concatenate DataFrames using Pandas

Improve
Improve
Like Article
Like
Save
Share
Report

A dataframe is a two-dimensional data structure having multiple rows and columns. In a Pandas DataFrame, the data is aligned in the form of rows and columns only. A dataframe can perform arithmetic as well as conditional operations. It has a mutable size. This article will show how to join, concatenate, and merge in Pandas.

Python Merge, Join, and Concatenate DataFrames Using Pandas

Below are the different ways and approaches by which we can merge, join, and concatenate in Pandas in Python:

DataFrames Concatenation in Python Pandas

concat() function does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes. In this example, three DataFrames (df1, df2, and df3) with identical columns but different indices are created. Using pd.concat(), these Pandas DataFrames are vertically stacked, resulting in a combined DataFrame where rows from each original DataFrame follow one another.

Python3




# Creating first dataframe
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 1, 2, 3])
 
# Creating second dataframe
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
                   index=[4, 5, 6, 7])
 
# Creating third dataframe
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                    'B': ['B8', 'B9', 'B10', 'B11'],
                    'C': ['C8', 'C9', 'C10', 'C11'],
                    'D': ['D8', 'D9', 'D10', 'D11']},
                   index=[8, 9, 10, 11])
 
# Concatenating the dataframes
pd.concat([df1, df2, df3])


Output:

      A    B    C    D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

Python Merge in Pandas in Python

DataFrames Merge Pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame objects. In this example, two DataFrames (left and right) are created with a common key column ‘Key’. The pd.merge() function merges these DataFrames based on their common ‘Key’ column using an inner join, resulting in a combined DataFrame containing only the rows where the key values match in both DataFrames.

Python3




# Dataframe created
left = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
 
right = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})
 
# Merging the dataframes
pd.merge(left, right, how='inner', on='Key')


Output:

  Key    A    B   C   D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3

Left Merge

The pd.merge() function with how='left' performs a left join, merging the left DataFrame with the right DataFrame on the ‘Key’ column. All rows from the left DataFrame are retained, and matching rows from the right DataFrame are appended with their respective values.

Python3




left_merged = pd.merge(left, right, how='left', on='Key')
print(left_merged)


Output:

  Key    A    B   C   D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3

Right Merge

The pd.merge() function with how='right' performs a right join, merging the left DataFrame with the right DataFrame on the ‘Key’ column. All rows from the right DataFrame are retained, and matching rows from the left DataFrame are appended with their respective values.

Python3




right_merged = pd.merge(left, right, how='right', on='Key')
print(right_merged)


Output:

  Key    A    B   C   D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3

Outer Merge

The pd.merge() function with how='outer' performs a full outer join, merging the left DataFrame with the right DataFrame on the ‘Key’ column. This results in a DataFrame that contains all rows from both the left and right DataFrames, filling in missing values with NaN where there isn’t a match.

Python3




outer_merged = pd.merge(left, right, how='outer', on='Key')
print(outer_merged)


Output:

  Key    A    B   C   D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3

DataFrames Join in Python Pandas

In this example, both left and right DataFrames are indexed with keys ‘K0’, ‘K1’, ‘K2’, and ‘K3’. The left.join(right) operation merges the DataFrames on their indices, combining columns ‘A’ and ‘B’ from left with columns ‘C’ and ‘D’ from right based on the shared index.

Python3




left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']},
                    index=['K0', 'K1', 'K2', 'K3'])
 
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']},
                     index=['K0', 'K1', 'K2', 'K3'])
 
# Joining the dataframes
left.join(right)


Output

     A   B   C   D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1
K2 A2 B2 C2 D2
K3 A3 B3 C3 D3


Last Updated : 22 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads