PySpark DataFrame – Select all except one or a set of columns

In this article, we are going to extract all columns except a set of columns or one column from Pyspark dataframe. For this, we will use the select(), drop() functions.

But first, let’s create Dataframe for demonestration.

Python3

# importing module 

import pyspark 

# importing sparksession from pyspark.sql module 

from pyspark.sql import SparkSession 

# creating sparksession and giving an app name 

spark = SparkSession.builder.appName('sparkdf').getOrCreate() 

# list  of students  data 

data = [["1", "sravan", "vignan"], 

        ["2", "ojaswi", "vvit"], 

        ["3", "rohith", "vvit"], 

        ["4", "sridevi", "vignan"], 

        ["1", "sravan", "vignan"], 

        ["5", "gnanesh", "iit"]] 

# specify column names 

columns = ['student ID', 'student NAME', 'college'] 

# creating a dataframe from the lists of data 

dataframe = spark.createDataFrame(data, columns) 

print('Actual data in dataframe') 
dataframe.show()

Output:

Method 1: Using drop() function

drop() is used to drop the columns from the dataframe.

Syntax: dataframe.drop(‘column_names’)

Where dataframe is the input dataframe and column names are the columns to be dropped

Example: Python program to select data by dropping one column

Python3

# drop student id 

dataframe.drop('student ID').show() 

Output:

Example 2: Python program to drop more than one column(set of columns)

Python3

# drop student id and college 

dataframe.drop('student ID','college').show() 

Output:

Method 2: Using select() function

This function is used to select the columns from the dataframe

Syntax: dataframe.select(columns)

Where dataframe is the input dataframe and columns are the input columns

Example 1: Select one column from the dataframe.

Python3

# select student id  

dataframe.select('student ID').show() 

Output:

Example 2: Python program to select two columns id and name

Python3

# select student id and student name 

dataframe.select('student ID','student NAME').show() 

Output:

Article Tags :

Python

Python-Pyspark