How to get a value from the Row object in PySpark Dataframe?

Last Updated : 04 Jan, 2022

In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame.

Method 1 : Using getitem() magic method

We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the __getitem()__ magic method to get an item of a particular column name. Given below is the syntax.

Syntax : DataFrame.__getitem__(‘Column_Name’)

Returns : value corresponding to the column name in the Row object

Python

# library import 
import pyspark 
from pyspark.sql import SparkSession 
from pyspark.sql import Row 
  
# Session Creation 
random_value_session = SparkSession.builder.appName( 
    'Random_Value_Session'
).getOrCreate() 
  
# Data filled in our DataFrame 
# 5 rows below 
rows = [['All England Open', 'March', 'Super 1000'], 
        ['Malaysia Open', 'January', 'Super 750'], 
        ['Korea Open', 'April', 'Super 500'], 
        ['Hylo Open', 'November', 'Super 100'], 
        ['Spain Masters', 'March', 'Super 300']] 
  
# Columns of our DataFrame 
columns = ['Tournament', 'Month', 'Level'] 
  
#DataFrame is created 
dataframe = random_value_session.createDataFrame(rows, 
                                                 columns) 
  
# Showing the DataFrame 
dataframe.show() 
  
# getting list of rows using collect() 
row_list = dataframe.collect() 
  
# Printing the first Row object 
# from which data is extracted 
print(row_list[0]) 
  
# Using __getitem__() magic method 
# To get value corresponding to a particular 
# column 
print(row_list[0].__getitem__('Level')) 
print(row_list[0].__getitem__('Tournament')) 
print(row_list[0].__getitem__('Level')) 
print(row_list[0].__getitem__('Month')) 

Output:

+----------------+--------+----------+
|      Tournament|   Month|     Level|
+----------------+--------+----------+
|All England Open|   March|Super 1000|
|   Malaysia Open| January| Super 750|
|      Korea Open|   April| Super 500|
|       Hylo Open|November| Super 100|
|   Spain Masters|   March| Super 300|
+----------------+--------+----------+

Row(Tournament='All England Open', Month='March', Level='Super 1000')
Super 1000
All England Open
Super 1000
March

Method 2 : Using asDict() method

We will create a Spark DataFrame with atleast one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the asDict() method to get a dictionary where column names are keys and their row values are dictionary values. Given below is the syntax:

Syntax : DataFrame.asDict(recursive)

Parameters :

recursive: bool : returns nested rows as dict. The default value is False.

We then get easily get the value from the dictionary using DictionaryName[‘key_name’].

Python

# library imports are done here 
import pyspark 
from pyspark.sql import SparkSession 
  
# Session Creation 
random_value_session = SparkSession.builder.appName( 
    'Random_Value_Session'
).getOrCreate() 
  
# Data filled in our DataFrame 
# Rows below will be filled 
rows = [['French Open', 'October', 'Super 750'], 
        ['Macau Open', 'November', 'Super 300'], 
        ['India Open', 'January', 'Super 500'], 
        ['Odisha Open', 'January', 'Super 100'], 
        ['China Open', 'November', 'Super 1000']] 
  
# DataFrame Columns 
columns = ['Tournament', 'Month', 'Level'] 
  
# DataFrame creation 
dataframe = random_value_session.createDataFrame(rows, 
                                                 columns) 
  
# DataFrame print 
dataframe.show() 
  
# list of rows using collect() 
row_list = dataframe.collect() 
  
# Printing the second Row object 
# from which we will read data 
print(row_list[1]) 
print() 
  
# Printing dictionary to make 
# things more clear 
print(row_list[1].asDict()) 
print() 
  
# Using asDict() method to convert row object 
# into a dictionary where the column names are keys 
# Using column names as keys to get respective values 
print(row_list[1].asDict()['Tournament']) 
print(row_list[1].asDict()['Month']) 
print(row_list[1].asDict()['Level']) 

Output :

+-----------+--------+----------+
| Tournament|   Month|     Level|
+-----------+--------+----------+
|French Open| October| Super 750|
| Macau Open|November| Super 300|
| India Open| January| Super 500|
|Odisha Open| January| Super 100|
| China Open|November|Super 1000|
+-----------+--------+----------+

Row(Tournament='Macau Open', Month='November', Level='Super 300')

{'Tournament': 'Macau Open', 'Month': 'November', 'Level': 'Super 300'}

Macau Open
November
Super 300

Method 3: Imagining Row object just like a list

Here we will imagine a Row object like a Python List and perform operations. We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). Since we are imagining the Row object like a List, we just use :

Syntax : RowObject[‘Column_name’]

Returns : Value corresponding to the column name in the row object.

Python

# library imports are done here 
import pyspark 
from pyspark.sql import SparkSession 
  
# Session Creation 
random_value_session = SparkSession.builder.appName( 
    'Random_Value_Session'
).getOrCreate() 
  
# Data filled in our DataFrame 
# Rows below will be filled 
rows = [['Denmark Open', 'October', 'Super 1000'], 
        ['Indonesia Open', 'June', 'Super 1000'], 
        ['Korea Open', 'April', 'Super 500'], 
        ['Japan Open', 'August', 'Super 750'], 
        ['Akita Masters', 'July', 'Super 100']] 
  
# DataFrame Columns 
columns = ['Tournament', 'Month', 'Level'] 
  
# DataFrame creation 
dataframe = random_value_session.createDataFrame(rows, 
                                                 columns) 
  
# DataFrame print 
dataframe.show() 
  
# list of rows using collect() 
row_list = dataframe.collect() 
  
# Lets take the third Row object 
row_object = row_list[2] 
  
# If we imagine it as a Python List, 
# We can get the first value of the list, 
# index 0, let's try it 
print(row_object[0]) 
  
# We got the value of column at index 0 
# which is - 'Tournament' 
  
# A few more examples 
print(row_list[4][0]) 
print(row_list[3][1]) 
print(row_list[4][2]) 

Output:

+--------------+-------+----------+
|    Tournament|  Month|     Level|
+--------------+-------+----------+
|  Denmark Open|October|Super 1000|
|Indonesia Open|   June|Super 1000|
|    Korea Open|  April| Super 500|
|    Japan Open| August| Super 750|
| Akita Masters|   July| Super 100|
+--------------+-------+----------+

Korea Open
Akita Masters
August
Super 100

Suggest improvement

Show distinct column values in PySpark dataframe

How to Install Python docutils in Windows?

Share your thoughts in the comments

How to get a value from the Row object in PySpark Dataframe?

Method 1 : Using __getitem()__ magic method

Python

Method 2 : Using asDict() method

Python

Method 3: Imagining Row object just like a list

Python

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?

Method 1 : Using getitem() magic method