Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App

Related Articles

Python PySpark – Drop columns based on column names or String condition

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

In this article, we will be looking at the step-wise approach to dropping columns based on column names or String conditions in PySpark.

Stepwise Implementation

Step1: Create CSV

Under this step, we are simply creating a CSV file with three rows and columns.

CSV Used:


Step 2: Import PySpark Library

Under this step, we are importing the PySpark packages to use its functionality by using the below syntax:

import pyspark

Step 3: Start a SparkSession

In this step we are simply starting our spark session using the SparkSession.builder.appName() function.


from pyspark.sql import SparkSession
spark = SparkSession.builder.appName(
    'GeeksForGeeks').getOrCreate()  # You can use any appName



Step 4: Read our CSV

To read our CSV we use It has 2 parameters: 

  • header = True [Sets column names to First row in the CSV]
  • inferSchema = True [Sets the right datatypes for the column elements]


df ='book1.csv', header=True, inferSchema=True)



Step 5: Drop Column based on Column Name

Finally, we can see how simple it is to Drop a Column based on the Column Name. 

To Drop a column we use DataFrame.drop(). And to the result to it, we will see that the Gender column is now not part of the Dataframe. see


df = df.drop("Gender")


My Personal Notes arrow_drop_up
Last Updated : 27 Mar, 2023
Like Article
Save Article
Similar Reads
Related Tutorials