Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

How to delete columns in PySpark dataframe ?

  • Last Updated : 17 Jun, 2021

In this article, we are going to delete columns in Pyspark dataframe. To do this we will be using the drop() function. This function can be used to remove values from the dataframe.

Syntax: dataframe.drop(‘column name’)

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

Python code to create student dataframe with three columns:



Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data 
data =[["1","sravan","vignan"],
       ["2","ojaswi","vvit"],
       ["3","rohith","vvit"],
       ["4","sridevi","vignan"],
       ["1","sravan","vignan"], 
       ["5","gnanesh","iit"]]
  
# specify column names
columns=['student ID','student NAME','college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
  
print("Actual data in dataframe")
  
# show dataframe
dataframe.show()

Output:

Actual data in dataframe
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
|         1|      sravan| vignan|
|         2|      ojaswi|   vvit|
|         3|      rohith|   vvit|
|         4|     sridevi| vignan|
|         1|      sravan| vignan|
|         5|     gnanesh|    iit|
+----------+------------+-------+

Example 1: Python program to delete a single column.

Here we are going to delete ‘student ID’ from the dataframe, for this we will use drop().

Python3




# delete single column
dataframe=dataframe.drop('student ID')
dataframe.show()

Output:

+------------+-------+
|student NAME|college|
+------------+-------+
|      sravan| vignan|
|      ojaswi|   vvit|
|      rohith|   vvit|
|     sridevi| vignan|
|      sravan| vignan|
|     gnanesh|    iit|
+------------+-------+

Example 2: Delete multiple columns

Here we will delete multiple columns in a dataframe just passing multiple columns inside the drop() function.

Python3




# delete two columns
dataframe=dataframe.drop(*('student NAME',
                           'student ID'))
dataframe.show()

Output:

+-------+
|college|
+-------+
| vignan|
|   vvit|
|   vvit|
| vignan|
| vignan|
|    iit|
+-------+

Example 3: Delete all columns

Here we will delete all the columns in dataframe.

Python3




# delete two columns
dataframe=dataframe.drop(*('student NAME',
                           'student ID',
                           'college'))
dataframe.show()

Output:

++
||
++
||
||
||
||
||
||
++



My Personal Notes arrow_drop_up
Recommended Articles
Page :