Get value of a particular cell in PySpark Dataframe

Last Updated : 30 Jun, 2021

In this article, we are going to get the value of a particular cell in the pyspark dataframe.

For this, we will use the collect() function to get the all rows in the dataframe. We can specify the index (cell positions) to the collect function

Creating dataframe for demonstration:

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# list  of employee data with 5 row values 
data =[["1","sravan","company 1"], 
       ["2","ojaswi","company 2"], 
       ["3","bobby","company 3"], 
       ["4","rohith","company 2"], 
       ["5","gnanesh","company 1"]] 
  
# specify column names 
columns=['Employee ID','Employee NAME', 
         'Company Name'] 
  
# creating a dataframe from the lists of data 
dataframe = spark.createDataFrame(data,columns) 
  
# display dataframe 
dataframe.show()

Output:

collect(): This is used to get all rows of data from the dataframe in list format.

Syntax: dataframe.collect()

Example 1: Python program that demonstrates the collect() function

Python3

# display dataframe using collect() 
dataframe.collect() 

Output:

[Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′),

Row(Employee ID=’2′, Employee NAME=’ojaswi’, Company Name=’company 2′),

Row(Employee ID=’3′, Employee NAME=’bobby’, Company Name=’company 3′),

Row(Employee ID=’4′, Employee NAME=’rohith’, Company Name=’company 2′),

Row(Employee ID=’5′, Employee NAME=’gnanesh’, Company Name=’company 1′)]

Example 2: Get a particular row

In order to get a particular row, We can use the indexing method along with collect. In pyspark dataframe, indexing starts from 0

Syntax: dataframe.collect()[index_number]

Python3

# display dataframe using collect() 
print("First row :",dataframe.collect()[0]) 
  
print("Third row :",dataframe.collect()[2])

Output:

First row : Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′)

Third row : Row(Employee ID=’3′, Employee NAME=’bobby’, Company Name=’company 3′)

Example 3: Get a particular cell

We have to specify the row and column indexes along with collect() function

Syntax: dataframe.collect()[row_index][column_index]

where, row_index is the row number and column_index is the column number

Here we access values from cells in the dataframe.

Python3

# first row - second column 
print("first row - second column  :", 
      dataframe.collect()[0][1]) 
  
# Third  row - Third column 
print("Third  row - Third column  :", 
      dataframe.collect()[2][1]) 
  
# Third  row - Third column 
print("Third  row - Third column  :", 
      dataframe.collect()[2][2])

Output:

first row - second column  : sravan
Third  row - Third column  : bobby
Third  row - Third column  : company 3

Suggest improvement

How to find the sum of Particular Column in PySpark Dataframe

Share your thoughts in the comments

Get value of a particular cell in PySpark Dataframe

Python3

Python3

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?