Concatenate two PySpark dataframes
In this article, we are going to see how to concatenate two pyspark dataframe using Python.
Creating Dataframe for demonstration:
Python3
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'pyspark - example join' ).getOrCreate()
data = [(( 'Ram' ), '1991-04-01' , 'M' , 3000 ),
(( 'Mike' ), '2000-05-19' , 'M' , 4000 ),
(( 'Rohini' ), '1978-09-05' , 'M' , 4000 ),
(( 'Maria' ), '1967-12-01' , 'F' , 4000 ),
(( 'Jenis' ), '1980-02-17' , 'F' , 1200 )]
columns = [ "Name" , "DOB" , "Gender" , "salary" ]
df1 = spark.createDataFrame(data = data, schema = columns)
df1.show()
|
Output:
+------+----------+------+------+
| Name| DOB|Gender|salary|
+------+----------+------+------+
| Ram|1991-04-01| M| 3000|
| Mike|2000-05-19| M| 4000|
|Rohini|1978-09-05| M| 4000|
| Maria|1967-12-01| F| 4000|
| Jenis|1980-02-17| F| 1200|
+------+----------+------+------+
Creating Second dataframe for demonstration:
Python3
data2 = [(( 'Mohi' ), '1991-04-01' , 'M' , 3000 ),
(( 'Ani' ), '2000-05-19' , 'F' , 4300 ),
(( 'Shipta' ), '1978-09-05' , 'F' , 4200 ),
(( 'Jessy' ), '1967-12-01' , 'F' , 4010 ),
(( 'kanne' ), '1980-02-17' , 'F' , 1200 )]
columns = [ "Name" , "DOB" , "Gender" , "salary" ]
df2 = spark.createDataFrame(data = data, schema = columns)
df2.show()
|
Output:
+------+----------+------+------+
| Name| DOB|Gender|salary|
+------+----------+------+------+
| Ram|1991-04-01| M| 3000|
| Mike|2000-05-19| M| 4000|
|Rohini|1978-09-05| M| 4000|
| Maria|1967-12-01| F| 4000|
| Jenis|1980-02-17| F| 1200|
+------+----------+------+------+
Method 1: Using Union()
Union() methods of the DataFrame are employed to mix two DataFrame’s of an equivalent structure/schema.
Syntax: dataframe_1.union(dataframe_2)
where,
- dataframe_1 is the first dataframe
- dataframe_2 is the second dataframe
Example:
Python3
result = df1.union(df2)
result.show()
|
Output:
+------+----------+------+------+
| Name| DOB|Gender|salary|
+------+----------+------+------+
| Ram|1991-04-01| M| 3000|
| Mike|2000-05-19| M| 4000|
|Rohini|1978-09-05| M| 4000|
| Maria|1967-12-01| F| 4000|
| Jenis|1980-02-17| F| 1200|
| Ram|1991-04-01| M| 3000|
| Mike|2000-05-19| M| 4000|
|Rohini|1978-09-05| M| 4000|
| Maria|1967-12-01| F| 4000|
| Jenis|1980-02-17| F| 1200|
+------+----------+------+------+
In Spark 3.1, you can easily achieve this using unionByName() for Concatenating the dataframe
Syntax: dataframe_1.unionByName(dataframe_2)
where,
- dataframe_1 is the first dataframe
- dataframe_2 is the second dataframe
Example:
Python3
result1 = df1.unionByName(df2)
result1.show()
|
Output:
+------+----------+------+------+
| Name| DOB|Gender|salary|
+------+----------+------+------+
| Ram|1991-04-01| M| 3000|
| Mike|2000-05-19| M| 4000|
|Rohini|1978-09-05| M| 4000|
| Maria|1967-12-01| F| 4000|
| Jenis|1980-02-17| F| 1200|
| Ram|1991-04-01| M| 3000|
| Mike|2000-05-19| M| 4000|
|Rohini|1978-09-05| M| 4000|
| Maria|1967-12-01| F| 4000|
| Jenis|1980-02-17| F| 1200|
+------+----------+------+------+
Functools module provides functions for working with other functions and callable objects to use or extend them without completely rewriting them.
Syntax:
functools.reduce(lambda df1, df2: df1.union(df2.select(df1.columns)), dfs)
where,
- df1 is the first dataframe
- df2 is the second dataframe
We create dataframes with columns ‘a’ and ‘b’ of some random values and pass these three dataframes to our above-created method unionAll() and obtain the resultant dataframe as output and show the result.
Example:
Python3
import functools
def unionAll(dfs):
return functools. reduce ( lambda df1, df2: df1.union(
df2.select(df1.columns)), dfs)
result3 = unionAll([df1, df2])
result3.show()
|
Output:
+------+----------+------+------+
| Name| DOB|Gender|salary|
+------+----------+------+------+
| Ram|1991-04-01| M| 3000|
| Mike|2000-05-19| M| 4000|
|Rohini|1978-09-05| M| 4000|
| Maria|1967-12-01| F| 4000|
| Jenis|1980-02-17| F| 1200|
| Ram|1991-04-01| M| 3000|
| Mike|2000-05-19| M| 4000|
|Rohini|1978-09-05| M| 4000|
| Maria|1967-12-01| F| 4000|
| Jenis|1980-02-17| F| 1200|
+------+----------+------+------+
Last Updated :
04 Jan, 2022
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...