How to get a value from the Row object in PySpark Dataframe?

Last Updated : 04 Jan, 2022

In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame.

Method 1 : Using getitem() magic method

We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the __getitem()__ magic method to get an item of a particular column name. Given below is the syntax.

Syntax : DataFrame.__getitem__(‘Column_Name’)

Returns : value corresponding to the column name in the Row object

Python

# library import 
import pyspark 
from pyspark.sql import SparkSession 
from pyspark.sql import Row 
  
# Session Creation 
random_value_session = SparkSession.builder.appName( 
    'Random_Value_Session'
).getOrCreate() 
  
# Data filled in our DataFrame 
# 5 rows below 
rows = [['All England Open', 'March', 'Super 1000'], 
        ['Malaysia Open', 'January', 'Super 750'], 
        ['Korea Open', 'April', 'Super 500'], 
        ['Hylo Open', 'November', 'Super 100'], 
        ['Spain Masters', 'March', 'Super 300']] 
  
# Columns of our DataFrame 
columns = ['Tournament', 'Month', 'Level'] 
  
#DataFrame is created 
dataframe = random_value_session.createDataFrame(rows, 
                                                 columns) 
  
# Showing the DataFrame 
dataframe.show() 
  
# getting list of rows using collect() 
row_list = dataframe.collect() 
  
# Printing the first Row object 
# from which data is extracted 
print(row_list[0]) 
  
# Using __getitem__() magic method 
# To get value corresponding to a particular 
# column 
print(row_list[0].__getitem__('Level')) 
print(row_list[0].__getitem__('Tournament')) 
print(row_list[0].__getitem__('Level')) 
print(row_list[0].__getitem__('Month')) 

Output:

+----------------+--------+----------+
|      Tournament|   Month|     Level|
+----------------+--------+----------+
|All England Open|   March|Super 1000|
|   Malaysia Open| January| Super 750|
|      Korea Open|   April| Super 500|
|       Hylo Open|November| Super 100|
|   Spain Masters|   March| Super 300|
+----------------+--------+----------+

Row(Tournament='All England Open', Month='March', Level='Super 1000')
Super 1000
All England Open
Super 1000
March

Method 2 : Using asDict() method

We will create a Spark DataFrame with atleast one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the asDict() method to get a dictionary where column names are keys and their row values are dictionary values. Given below is the syntax:

Syntax : DataFrame.asDict(recursive)

Parameters :

recursive: bool : returns nested rows as dict. The default value is False.

We then get easily get the value from the dictionary using DictionaryName[‘key_name’].

Python

# library imports are done here 
import pyspark 
from pyspark.sql import SparkSession 
  
# Session Creation 
random_value_session = SparkSession.builder.appName( 
    'Random_Value_Session'
).getOrCreate() 
  
# Data filled in our DataFrame 
# Rows below will be filled 
rows = [['French Open', 'October', 'Super 750'], 
        ['Macau Open', 'November', 'Super 300'], 
        ['India Open', 'January', 'Super 500'], 
        ['Odisha Open', 'January', 'Super 100'], 
        ['China Open', 'November', 'Super 1000']] 
  
# DataFrame Columns 
columns = ['Tournament', 'Month', 'Level'] 
  
# DataFrame creation 
dataframe = random_value_session.createDataFrame(rows, 
                                                 columns) 
  
# DataFrame print 
dataframe.show() 
  
# list of rows using collect() 
row_list = dataframe.collect() 
  
# Printing the second Row object 
# from which we will read data 
print(row_list[1]) 
print() 
  
# Printing dictionary to make 
# things more clear 
print(row_list[1].asDict()) 
print() 
  
# Using asDict() method to convert row object 
# into a dictionary where the column names are keys 
# Using column names as keys to get respective values 
print(row_list[1].asDict()['Tournament']) 
print(row_list[1].asDict()['Month']) 
print(row_list[1].asDict()['Level']) 

Output :

+-----------+--------+----------+
| Tournament|   Month|     Level|
+-----------+--------+----------+
|French Open| October| Super 750|
| Macau Open|November| Super 300|
| India Open| January| Super 500|
|Odisha Open| January| Super 100|
| China Open|November|Super 1000|
+-----------+--------+----------+

Row(Tournament='Macau Open', Month='November', Level='Super 300')

{'Tournament': 'Macau Open', 'Month': 'November', 'Level': 'Super 300'}

Macau Open
November
Super 300

Method 3: Imagining Row object just like a list

Here we will imagine a Row object like a Python List and perform operations. We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). Since we are imagining the Row object like a List, we just use :

Syntax : RowObject[‘Column_name’]

Returns : Value corresponding to the column name in the row object.

Python

# library imports are done here 
import pyspark 
from pyspark.sql import SparkSession 
  
# Session Creation 
random_value_session = SparkSession.builder.appName( 
    'Random_Value_Session'
).getOrCreate() 
  
# Data filled in our DataFrame 
# Rows below will be filled 
rows = [['Denmark Open', 'October', 'Super 1000'], 
        ['Indonesia Open', 'June', 'Super 1000'], 
        ['Korea Open', 'April', 'Super 500'], 
        ['Japan Open', 'August', 'Super 750'], 
        ['Akita Masters', 'July', 'Super 100']] 
  
# DataFrame Columns 
columns = ['Tournament', 'Month', 'Level'] 
  
# DataFrame creation 
dataframe = random_value_session.createDataFrame(rows, 
                                                 columns) 
  
# DataFrame print 
dataframe.show() 
  
# list of rows using collect() 
row_list = dataframe.collect() 
  
# Lets take the third Row object 
row_object = row_list[2] 
  
# If we imagine it as a Python List, 
# We can get the first value of the list, 
# index 0, let's try it 
print(row_object[0]) 
  
# We got the value of column at index 0 
# which is - 'Tournament' 
  
# A few more examples 
print(row_list[4][0]) 
print(row_list[3][1]) 
print(row_list[4][2]) 

Output:

+--------------+-------+----------+
|    Tournament|  Month|     Level|
+--------------+-------+----------+
|  Denmark Open|October|Super 1000|
|Indonesia Open|   June|Super 1000|
|    Korea Open|  April| Super 500|
|    Japan Open| August| Super 750|
| Akita Masters|   July| Super 100|
+--------------+-------+----------+

Korea Open
Akita Masters
August
Super 100

Suggest improvement

How to Get substring from a column in PySpark Dataframe ?

Share your thoughts in the comments

How to get a value from the Row object in PySpark Dataframe?

Method 1 : Using __getitem()__ magic method

Python

Method 2 : Using asDict() method

Python

Method 3: Imagining Row object just like a list

Python

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?

Method 1 : Using getitem() magic method