How to delete columns in PySpark dataframe ?
Last Updated :
17 Jun, 2021
In this article, we are going to delete columns in Pyspark dataframe. To do this we will be using the drop() function. This function can be used to remove values from the dataframe.
Syntax: dataframe.drop(‘column name’)
Python code to create student dataframe with three columns:
Python3
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
data = [[ "1" , "sravan" , "vignan" ],
[ "2" , "ojaswi" , "vvit" ],
[ "3" , "rohith" , "vvit" ],
[ "4" , "sridevi" , "vignan" ],
[ "1" , "sravan" , "vignan" ],
[ "5" , "gnanesh" , "iit" ]]
columns = [ 'student ID' , 'student NAME' , 'college' ]
dataframe = spark.createDataFrame(data,columns)
print ( "Actual data in dataframe" )
dataframe.show()
|
Output:
Actual data in dataframe
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
| 1| sravan| vignan|
| 2| ojaswi| vvit|
| 3| rohith| vvit|
| 4| sridevi| vignan|
| 1| sravan| vignan|
| 5| gnanesh| iit|
+----------+------------+-------+
Example 1: Python program to delete a single column.
Here we are going to delete ‘student ID’ from the dataframe, for this we will use drop().
Python3
dataframe = dataframe.drop( 'student ID' )
dataframe.show()
|
Output:
+------------+-------+
|student NAME|college|
+------------+-------+
| sravan| vignan|
| ojaswi| vvit|
| rohith| vvit|
| sridevi| vignan|
| sravan| vignan|
| gnanesh| iit|
+------------+-------+
Example 2: Delete multiple columns
Here we will delete multiple columns in a dataframe just passing multiple columns inside the drop() function.
Python3
dataframe = dataframe.drop( * ( 'student NAME' ,
'student ID' ))
dataframe.show()
|
Output:
+-------+
|college|
+-------+
| vignan|
| vvit|
| vvit|
| vignan|
| vignan|
| iit|
+-------+
Example 3: Delete all columns
Here we will delete all the columns in dataframe.
Python3
dataframe = dataframe.drop( * ( 'student NAME' ,
'student ID' ,
'college' ))
dataframe.show()
|
Output:
++
||
++
||
||
||
||
||
||
++
Share your thoughts in the comments
Please Login to comment...