Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App
geeksforgeeks
Browser
Continue

Related Articles

How to get a value from the Row object in PySpark Dataframe?

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame.

Method 1 : Using __getitem()__ magic method

We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the __getitem()__ magic method to get an item of a particular column name. Given below is the syntax.

Syntax : DataFrame.__getitem__(‘Column_Name’)

Returns : value corresponding to the column name in the Row object

Python




# library import
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import Row
  
# Session Creation
random_value_session = SparkSession.builder.appName(
    'Random_Value_Session'
).getOrCreate()
  
# Data filled in our DataFrame
# 5 rows below
rows = [['All England Open', 'March', 'Super 1000'],
        ['Malaysia Open', 'January', 'Super 750'],
        ['Korea Open', 'April', 'Super 500'],
        ['Hylo Open', 'November', 'Super 100'],
        ['Spain Masters', 'March', 'Super 300']]
  
# Columns of our DataFrame
columns = ['Tournament', 'Month', 'Level']
  
#DataFrame is created
dataframe = random_value_session.createDataFrame(rows,
                                                 columns)
  
# Showing the DataFrame
dataframe.show()
  
# getting list of rows using collect()
row_list = dataframe.collect()
  
# Printing the first Row object
# from which data is extracted
print(row_list[0])
  
# Using __getitem__() magic method
# To get value corresponding to a particular
# column
print(row_list[0].__getitem__('Level'))
print(row_list[0].__getitem__('Tournament'))
print(row_list[0].__getitem__('Level'))
print(row_list[0].__getitem__('Month'))

Output: 

+----------------+--------+----------+
|      Tournament|   Month|     Level|
+----------------+--------+----------+
|All England Open|   March|Super 1000|
|   Malaysia Open| January| Super 750|
|      Korea Open|   April| Super 500|
|       Hylo Open|November| Super 100|
|   Spain Masters|   March| Super 300|
+----------------+--------+----------+

Row(Tournament='All England Open', Month='March', Level='Super 1000')
Super 1000
All England Open
Super 1000
March

Method 2 : Using asDict() method

We will create a Spark DataFrame with atleast one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the asDict() method to get a dictionary where column names are keys and their row values are dictionary values. Given below is the syntax:

Syntax : DataFrame.asDict(recursive)

Parameters

recursive: bool : returns nested rows as dict. The default value is False.

We then get easily get the value from the dictionary using DictionaryName[‘key_name’].

Python




# library imports are done here
import pyspark
from pyspark.sql import SparkSession
  
# Session Creation
random_value_session = SparkSession.builder.appName(
    'Random_Value_Session'
).getOrCreate()
  
# Data filled in our DataFrame
# Rows below will be filled
rows = [['French Open', 'October', 'Super 750'],
        ['Macau Open', 'November', 'Super 300'],
        ['India Open', 'January', 'Super 500'],
        ['Odisha Open', 'January', 'Super 100'],
        ['China Open', 'November', 'Super 1000']]
  
# DataFrame Columns
columns = ['Tournament', 'Month', 'Level']
  
# DataFrame creation
dataframe = random_value_session.createDataFrame(rows,
                                                 columns)
  
# DataFrame print
dataframe.show()
  
# list of rows using collect()
row_list = dataframe.collect()
  
# Printing the second Row object
# from which we will read data
print(row_list[1])
print()
  
# Printing dictionary to make
# things more clear
print(row_list[1].asDict())
print()
  
# Using asDict() method to convert row object
# into a dictionary where the column names are keys
# Using column names as keys to get respective values
print(row_list[1].asDict()['Tournament'])
print(row_list[1].asDict()['Month'])
print(row_list[1].asDict()['Level'])

Output : 

+-----------+--------+----------+
| Tournament|   Month|     Level|
+-----------+--------+----------+
|French Open| October| Super 750|
| Macau Open|November| Super 300|
| India Open| January| Super 500|
|Odisha Open| January| Super 100|
| China Open|November|Super 1000|
+-----------+--------+----------+

Row(Tournament='Macau Open', Month='November', Level='Super 300')

{'Tournament': 'Macau Open', 'Month': 'November', 'Level': 'Super 300'}

Macau Open
November
Super 300

Method 3: Imagining Row object just like a list

Here we will imagine a Row object like a Python List and perform operations. We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). Since we are imagining the Row object like a List, we just use : 

Syntax : RowObject[‘Column_name’]

Returns : Value corresponding to the column name in the row object.

Python




# library imports are done here
import pyspark
from pyspark.sql import SparkSession
  
# Session Creation
random_value_session = SparkSession.builder.appName(
    'Random_Value_Session'
).getOrCreate()
  
# Data filled in our DataFrame
# Rows below will be filled
rows = [['Denmark Open', 'October', 'Super 1000'],
        ['Indonesia Open', 'June', 'Super 1000'],
        ['Korea Open', 'April', 'Super 500'],
        ['Japan Open', 'August', 'Super 750'],
        ['Akita Masters', 'July', 'Super 100']]
  
# DataFrame Columns
columns = ['Tournament', 'Month', 'Level']
  
# DataFrame creation
dataframe = random_value_session.createDataFrame(rows,
                                                 columns)
  
# DataFrame print
dataframe.show()
  
# list of rows using collect()
row_list = dataframe.collect()
  
# Lets take the third Row object
row_object = row_list[2]
  
# If we imagine it as a Python List,
# We can get the first value of the list,
# index 0, let's try it
print(row_object[0])
  
# We got the value of column at index 0
# which is - 'Tournament'
  
# A few more examples
print(row_list[4][0])
print(row_list[3][1])
print(row_list[4][2])

Output: 

+--------------+-------+----------+
|    Tournament|  Month|     Level|
+--------------+-------+----------+
|  Denmark Open|October|Super 1000|
|Indonesia Open|   June|Super 1000|
|    Korea Open|  April| Super 500|
|    Japan Open| August| Super 750|
| Akita Masters|   July| Super 100|
+--------------+-------+----------+

Korea Open
Akita Masters
August
Super 100

My Personal Notes arrow_drop_up
Last Updated : 04 Jan, 2022
Like Article
Save Article
Similar Reads
Related Tutorials