Open In App

Filtering a row in PySpark DataFrame based on matching values from a list

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to filter the rows in the dataframe based on matching values in the list by using isin in Pyspark dataframe

isin(): This is used to find the elements contains in a given dataframe, it will take the elements and get the elements to match to the data

Syntax: isin([element1,element2,.,element n])

Create Dataframe for demonstration:

Python3




# importing module
import pyspark
  
# importing sparksession
from pyspark.sql import SparkSession
  
# creating sparksession
# and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data  with null values
# we can define null values with none
data = [[1, "sravan", "vignan"],
        [2, "ramya", "vvit"],
        [3, "rohith", "klu"],
        [4, "sridevi", "vignan"],
        [5, "gnanesh", "iit"]]
  
# specify column names
columns = ['ID', 'NAME', 'college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
dataframe.show()


Output:

Method 1: Using filter() method

It is used to check the condition and give the results, Both are similar

Syntax: dataframe.filter(condition)

Where, condition is the dataframe condition.

Here we will use all the discussed methods.

Syntax: dataframe.filter((dataframe.column_name).isin([list_of_elements])).show()

where,

  • column_name is the column
  • elements are the values that are present in the column
  • show() is used to show the resultant dataframe

Example 1: Get the particular ID’s with filter() clause.

Python3




# get the ID : 1,2,3 from dataframe
dataframe.filter((dataframe.ID).isin([1,2,3])).show()


Output:

Example 2: Get ID’s not present in 1 and 3

Python3




# get the ID : not in 1 and 3 from dataframe
dataframe.filter(~(dataframe.ID).isin([1, 3])).show()


Output:

Example 3: Get names from dataframe.

Python3




# get name as sravan
dataframe.filter((
  dataframe.NAME).isin(['sravan'])).show()


Output:

Method 2: Using where() method

where() is used to check the condition and give the results

Syntax: dataframe.where(condition)

where, condition is the dataframe condition

Overall Syntax with where clause:

dataframe.where((dataframe.column_name).isin([elements])).show()

where,

  • column_name is the column
  • elements are the values that are present in the column
  • show() is used to show the resultant dataframe

Example: Get the particular colleges with where() clause

Python3




# get college as vignan
dataframe.where((
  dataframe.college).isin(['vignan'])).show()


Output:



Last Updated : 28 Jul, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads