Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

PySpark DataFrame – Select all except one or a set of columns

  • Last Updated : 17 Jun, 2021

In this article, we are going to extract all columns except a set of columns or one column from Pyspark dataframe. For this, we will use the select(), drop() functions.

But first, let’s create Dataframe for demonestration.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course


# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list  of students  data
data = [["1", "sravan", "vignan"],
        ["2", "ojaswi", "vvit"],
        ["3", "rohith", "vvit"],
        ["4", "sridevi", "vignan"],
        ["1", "sravan", "vignan"],
        ["5", "gnanesh", "iit"]]
# specify column names
columns = ['student ID', 'student NAME', 'college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
print('Actual data in dataframe')


Method 1: Using drop() function

drop() is used to drop the columns from the dataframe.

Syntax: dataframe.drop(‘column_names’)

Where dataframe is the input dataframe and column names are the columns to be dropped

Example: Python program to select data by dropping one column


# drop student id
dataframe.drop('student ID').show()


Example 2: Python program to drop more than one column(set of columns)


# drop student id and college
dataframe.drop('student ID','college').show()


Method 2: Using select() function

This function is used to select the columns from the dataframe


Where dataframe is the input dataframe and columns are the input columns

Example 1: Select one column from the dataframe.


# select student id'student ID').show()


Example 2: Python program to select two columns id and name


# select student id and student name'student ID','student NAME').show()


My Personal Notes arrow_drop_up
Recommended Articles
Page :