Python PySpark – Drop columns based on column names or String condition
In this article, we will be looking at the step-wise approach to dropping columns based on column names or String conditions in PySpark.
Stepwise Implementation
Step1: Create CSV
Under this step, we are simply creating a CSV file with three rows and columns.
CSV Used:

Step 2: Import PySpark Library
Under this step, we are importing the PySpark packages to use its functionality by using the below syntax:
import pyspark
Step 3: Start a SparkSession
In this step we are simply starting our spark session using the SparkSession.builder.appName() function.
Python3
from pyspark.sql import SparkSession spark = SparkSession.builder.appName( 'GeeksForGeeks' ).getOrCreate() # You can use any appName print (spark) |
Output:

Step 4: Read our CSV
To read our CSV we use spark.read.csv(). It has 2 parameters:
- header = True [Sets column names to First row in the CSV]
- inferSchema = True [Sets the right datatypes for the column elements]
Python3
df = spark.read.csv( 'book1.csv' , header = True , inferSchema = True ) df.show() |
Output:

Step 5: Drop Column based on Column Name
Finally, we can see how simple it is to Drop a Column based on the Column Name.
To Drop a column we use DataFrame.drop(). And to the result to it, we will see that the Gender column is now not part of the Dataframe. see
Python3
df = df.drop( "Gender" ) df.show() |

Please Login to comment...