In this article, we will discuss how to rename the multiple columns in PySpark Dataframe. For this we will use withColumnRenamed() and toDF() functions.
Creating Dataframe for demonstration:
# importing module import pyspark
# importing sparksession from pyspark.sql module from pyspark.sql import SparkSession
# creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
# list of students data with null values # we can define null values with none data = [[ None , "sravan" , "vignan" ],
[ "2" , None , "vvit" ],
[ "3" , "rohith" , None ],
[ "4" , "sridevi" , "vignan" ],
[ "1" , None , None ],
[ "5" , "gnanesh" , "iit" ]]
# specify column names columns = [ 'ID' , 'NAME' , 'college' ]
# creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns)
# show columns print (dataframe.columns)
# display dataframe dataframe.show() |
Output:
Method 1: Using withColumnRenamed()
This method is used to rename a column in the dataframe
Syntax: dataframe.withColumnRenamed(“old_column_name”, “new_column_name”)
where
- dataframe is the pyspark dataframe
- old_column_name is the existing column name
- new_column_name is the new column name
To change multiple columns, we can specify the functions for n times, separated by “.” operator
Syntax: dataframe.withColumnRenamed(“old_column_name”, “new_column_name”).
withColumnRenamed”old_column_name”, “new_column_name”)
Example 1: Python program to change the column name for two columns
# display actual columns print ( "Actual columns: " , dataframe.columns)
# change the college column name to university # and ID to student_id dataframe = dataframe.withColumnRenamed(
"college" , "university" ).withColumnRenamed( "ID" , "student_id" )
# display modified columns print ( "modified columns: " , dataframe.columns)
# final dataframe dataframe.show() |
Output:
Example 2: Rename all columns
# display actual columns print ( "Actual columns: " , dataframe.columns)
# change the college column name to university # and ID to student_id dataframe = dataframe.withColumnRenamed(
"college" , "university" ).withColumnRenamed(
"ID" , "student_id" ).withColumnRenamed( "NAME" , "student_name" )
# display modified columns print ( "modified columns: " , dataframe.columns)
# final dataframe dataframe.show() |
Output:
Method 2: Using toDF()
This method is used to change the names of all the columns of the dataframe
Syntax: dataframe.toDF(*(“column 1″,”column 2”,”column n))
where, columns are the columns in the dataframe
Example: Python program to change the column names
# display actual print ( "Actual columns: " , dataframe.columns)
# change column names to A,B,C dataframe = dataframe.toDF( * ( "A" , "B" , "C" ))
# display new columns print ( "New columns: " , dataframe.columns)
# display dataframe dataframe.show() |
Output: