Open In App
Related Articles

How to join on multiple columns in Pyspark?

Improve Article
Improve
Save Article
Save
Like Article
Like

In this article, we will discuss how to join multiple columns in PySpark Dataframe using Python.

Let’s create the first dataframe:

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [(1, "sravan"), (2, "ojsawi"), (3, "bobby")]
  
# specify column names
columns = ['ID1', 'NAME1']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
dataframe.show()


Output:

Let’s create the second dataframe:

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [(1, "sravan"), (2, "ojsawi"),
        (3, "bobby"),
        (4, "rohith"), (5, "gnanesh")]
  
# specify column names
columns = ['ID2', 'NAME2']
  
# creating a dataframe from the lists of data
dataframe1 = spark.createDataFrame(data, columns)
  
dataframe1.show()


Output:

we can join the multiple columns by using join() function using conditional operator

Syntax: dataframe.join(dataframe1, (dataframe.column1== dataframe1.column1) & (dataframe.column2== dataframe1.column2))

where, 

  • dataframe is the first dataframe
  • dataframe1 is the second dataframe
  • column1 is the first matching column in both the dataframes
  • column2 is the second matching column in both the dataframes

Example 1: PySpark code to join the two dataframes with multiple columns (id and name)

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [(1, "sravan"), (2, "ojsawi"), (3, "bobby")]
  
# specify column names
columns = ['ID1', 'NAME1']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# list  of employee data
data = [(1, "sravan"), (2, "ojsawi"), (3, "bobby"),
        (4, "rohith"), (5, "gnanesh")]
  
# specify column names
columns = ['ID2', 'NAME2']
  
# creating a dataframe from the lists of data
dataframe1 = spark.createDataFrame(data, columns)
  
# join based on ID and name column
dataframe.join(dataframe1, (dataframe.ID1 == dataframe1.ID2)
               & (dataframe.NAME1 == dataframe1.NAME2)).show()


Output:

Example 2: Join with or operator

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [(1, "sravan"), (2, "ojsawi"), (3, "bobby")]
  
# specify column names
columns = ['ID1', 'NAME1']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# list  of employee data
data = [(1, "sravan"), (2, "ojsawi"), (3, "bobby"),
        (4, "rohith"), (5, "gnanesh")]
  
# specify column names
columns = ['ID2', 'NAME2']
  
# creating a dataframe from the lists of data
dataframe1 = spark.createDataFrame(data, columns)
  
# join based on ID and name column
dataframe.join(dataframe1, (dataframe.ID1 == dataframe1.ID2)
               | (dataframe.NAME1 == dataframe1.NAME2)).show()


Output:


Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Last Updated : 19 Dec, 2021
Like Article
Save Article
Previous
Next
Similar Reads
Complete Tutorials