How to Add Multiple Columns in PySpark Dataframes ?

Last Updated : 30 Jun, 2021

In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes.

Let’s create a sample dataframe for demonstration:

Python3

# import pandas to read json file 
import pandas as pd 
  
# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql 
# module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
  
# create Dataframe 
df=spark.read.option( 
    "header",True).csv("Cricket_data_set_odi.csv") 
  
# Display Schema 
df.printSchema() 
  
# Show Dataframe 
df.show()

Output:

Method 1: Using withColumn()

withColumn() is used to add a new or update an existing column on DataFrame

Syntax: df.withColumn(colName, col)

Returns: A new :class:`DataFrame` by adding a column or replacing the existing column that has the same name.

Code:

Python3

df.withColumn( 
    'Avg_runs', df.Runs / df.Matches).withColumn( 
    'wkt+10', df.Wickets+10).show() 

Output:

Method 2: Using select()

You can also add multiple columns using select.

Syntax: df.select(*cols)

Code:

Python3

# Using select() to Add Multiple Column 
df.select('*', (df.Runs / df.Matches).alias('Avg_runs'), 
          (df.Wickets+10).alias('wkt+10')).show() 

Output :

Method 3: Adding a Constant multiple Column to DataFrame Using withColumn() and select()

Let’s create a new column with constant value using lit() SQL function, on the below code. The lit() function present in Pyspark is used to add a new column in a Pyspark Dataframe by assigning a constant or literal value.

Python3

from pyspark.sql.functions import col, lit 
  
  
df.select('*',lit("Cricket").alias("Sport")). 
withColumn("Fitness",lit(("Good"))).show()

Output:

Suggest improvement

How to rename multiple columns in PySpark dataframe ?

Share your thoughts in the comments

How to Add Multiple Columns in PySpark Dataframes ?

Python3

Method 1: Using withColumn()

Python3

Method 2: Using select()

Python3

Method 3: Adding a Constant multiple Column to DataFrame Using withColumn() and select()

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?