Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App
geeksforgeeks
Browser
Continue

Related Articles

Python PySpark – Drop columns based on column names or String condition

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

In this article, we will be looking at the step-wise approach to dropping columns based on column names or String conditions in PySpark.

Stepwise Implementation

Step1: Create CSV

Under this step, we are simply creating a CSV file with three rows and columns.

CSV Used:

 

Step 2: Import PySpark Library

Under this step, we are importing the PySpark packages to use its functionality by using the below syntax:

import pyspark

Step 3: Start a SparkSession

In this step we are simply starting our spark session using the SparkSession.builder.appName() function.

Python3




from pyspark.sql import SparkSession
 
spark = SparkSession.builder.appName(
    'GeeksForGeeks').getOrCreate()  # You can use any appName
print(spark)

Output:

 

Step 4: Read our CSV

To read our CSV we use spark.read.csv(). It has 2 parameters: 

  • header = True [Sets column names to First row in the CSV]
  • inferSchema = True [Sets the right datatypes for the column elements]

Python3




df = spark.read.csv('book1.csv', header=True, inferSchema=True)
df.show()

Output:

 

Step 5: Drop Column based on Column Name

Finally, we can see how simple it is to Drop a Column based on the Column Name. 

To Drop a column we use DataFrame.drop(). And to the result to it, we will see that the Gender column is now not part of the Dataframe. see

Python3




df = df.drop("Gender")
df.show()

 


My Personal Notes arrow_drop_up
Last Updated : 27 Mar, 2023
Like Article
Save Article
Similar Reads
Related Tutorials