In this article, we are going to get the value of a particular cell in the pyspark dataframe.
For this, we will use the collect() function to get the all rows in the dataframe. We can specify the index (cell positions) to the collect function
Creating dataframe for demonstration:
Python3
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
data = [[ "1" , "sravan" , "company 1" ],
[ "2" , "ojaswi" , "company 2" ],
[ "3" , "bobby" , "company 3" ],
[ "4" , "rohith" , "company 2" ],
[ "5" , "gnanesh" , "company 1" ]]
columns = [ 'Employee ID' , 'Employee NAME' ,
'Company Name' ]
dataframe = spark.createDataFrame(data,columns)
dataframe.show()
|
Output:

collect(): This is used to get all rows of data from the dataframe in list format.
Syntax: dataframe.collect()
Example 1: Python program that demonstrates the collect() function
Output:
[Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′),
Row(Employee ID=’2′, Employee NAME=’ojaswi’, Company Name=’company 2′),
Row(Employee ID=’3′, Employee NAME=’bobby’, Company Name=’company 3′),
Row(Employee ID=’4′, Employee NAME=’rohith’, Company Name=’company 2′),
Row(Employee ID=’5′, Employee NAME=’gnanesh’, Company Name=’company 1′)]
Example 2: Get a particular row
In order to get a particular row, We can use the indexing method along with collect. In pyspark dataframe, indexing starts from 0
Syntax: dataframe.collect()[index_number]
Python3
print ( "First row :" ,dataframe.collect()[ 0 ])
print ( "Third row :" ,dataframe.collect()[ 2 ])
|
Output:
First row : Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′)
Third row : Row(Employee ID=’3′, Employee NAME=’bobby’, Company Name=’company 3′)
Example 3: Get a particular cell
We have to specify the row and column indexes along with collect() function
Syntax: dataframe.collect()[row_index][column_index]
where, row_index is the row number and column_index is the column number
Here we access values from cells in the dataframe.
Python3
print ( "first row - second column :" ,
dataframe.collect()[ 0 ][ 1 ])
print ( "Third row - Third column :" ,
dataframe.collect()[ 2 ][ 1 ])
print ( "Third row - Third column :" ,
dataframe.collect()[ 2 ][ 2 ])
|
Output:
first row - second column : sravan
Third row - Third column : bobby
Third row - Third column : company 3