How to Rename Multiple PySpark DataFrame Columns
In this article, we will discuss how to rename the multiple columns in PySpark Dataframe. For this we will use withColumnRenamed() and toDF() functions.
Creating Dataframe for demonstration:
Python3
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
data = [[ None , "sravan" , "vignan" ],
[ "2" , None , "vvit" ],
[ "3" , "rohith" , None ],
[ "4" , "sridevi" , "vignan" ],
[ "1" , None , None ],
[ "5" , "gnanesh" , "iit" ]]
columns = [ 'ID' , 'NAME' , 'college' ]
dataframe = spark.createDataFrame(data, columns)
print (dataframe.columns)
dataframe.show()
|
Output:
Method 1: Using withColumnRenamed()
This method is used to rename a column in the dataframe
Syntax: dataframe.withColumnRenamed(“old_column_name”, “new_column_name”)
where
- dataframe is the pyspark dataframe
- old_column_name is the existing column name
- new_column_name is the new column name
To change multiple columns, we can specify the functions for n times, separated by “.” operator
Syntax: dataframe.withColumnRenamed(“old_column_name”, “new_column_name”).
withColumnRenamed”old_column_name”, “new_column_name”)
Example 1: Python program to change the column name for two columns
Python3
print ( "Actual columns: " , dataframe.columns)
dataframe = dataframe.withColumnRenamed(
"college" , "university" ).withColumnRenamed( "ID" , "student_id" )
print ( "modified columns: " , dataframe.columns)
dataframe.show()
|
Output:
Example 2: Rename all columns
Python3
print ( "Actual columns: " , dataframe.columns)
dataframe = dataframe.withColumnRenamed(
"college" , "university" ).withColumnRenamed(
"ID" , "student_id" ).withColumnRenamed( "NAME" , "student_name" )
print ( "modified columns: " , dataframe.columns)
dataframe.show()
|
Output:
Method 2: Using toDF()
This method is used to change the names of all the columns of the dataframe
Syntax: dataframe.toDF(*(“column 1″,”column 2”,”column n))
where, columns are the columns in the dataframe
Example: Python program to change the column names
Python3
print ( "Actual columns: " , dataframe.columns)
dataframe = dataframe.toDF( * ( "A" , "B" , "C" ))
print ( "New columns: " , dataframe.columns)
dataframe.show()
|
Output:
Last Updated :
29 Jun, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...