Skip to content
Related Articles

Related Articles

Improve Article

How to drop multiple column names given in a list from PySpark DataFrame ?

  • Last Updated : 17 Jun, 2021

In this article, we are going to drop multiple columns given in the list in Pyspark dataframe in Python.

For this, we will use the drop() function. This function is used to remove the value from dataframe.

Syntax: dataframe.drop(*[‘column 1′,’column 2′,’column n’])

Where,

  • dataframe is the input dataframe
  • column names are the  columns passed through a list  in the dataframe.

Python code to create student dataframe with three columns:



Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data 
data =[["1","sravan","vignan"],
       ["2","ojaswi","vvit"],
       ["3","rohith","vvit"],
       ["4","sridevi","vignan"],
       ["1","sravan","vignan"], 
       ["5","gnanesh","iit"]]
  
# specify column names
columns=['student ID','student NAME','college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
  
print("Actual data in dataframe")
  
# show dataframe
dataframe.show()

Output:

Actual data in dataframe
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
|         1|      sravan| vignan|
|         2|      ojaswi|   vvit|
|         3|      rohith|   vvit|
|         4|     sridevi| vignan|
|         1|      sravan| vignan|
|         5|     gnanesh|    iit|
+----------+------------+-------+

Example 1: Program to delete multiple column names as a list.

Python3




list = ['student NAME','college']
  
# drop two  columns in dataframe
dataframe = dataframe.drop(*list)
dataframe.show()

Output:

+----------+
|student ID|
+----------+
|         1|
|         2|
|         3|
|         4|
|         1|
|         5|
+----------+

Example 2: Example program to drop one column names as a list.

Python3




list = ['college']
  
# drop two  columns in dataframe
dataframe=dataframe.drop(*list)
dataframe.show()

Output:

+----------+------------+
|student ID|student NAME|
+----------+------------+
|         1|      sravan|
|         2|      ojaswi|
|         3|      rohith|
|         4|     sridevi|
|         1|      sravan|
|         5|     gnanesh|
+----------+------------+

Example 3: Drop all column names as a list.

Python3




list = ['student ID','student NAME','college']
  
# drop all  columns in dataframe
dataframe=dataframe.drop(*list)
dataframe.show()

Output:

++
||
++
||
||
||
||
||
||
++

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :