How to get a value from the Row object in PySpark Dataframe?
Last Updated :
04 Jan, 2022
In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame.
Method 1 : Using __getitem()__ magic method
We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the __getitem()__ magic method to get an item of a particular column name. Given below is the syntax.
Syntax : DataFrame.__getitem__(‘Column_Name’)
Returns : value corresponding to the column name in the Row object
Python
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import Row
random_value_session = SparkSession.builder.appName(
'Random_Value_Session'
).getOrCreate()
rows = [[ 'All England Open' , 'March' , 'Super 1000' ],
[ 'Malaysia Open' , 'January' , 'Super 750' ],
[ 'Korea Open' , 'April' , 'Super 500' ],
[ 'Hylo Open' , 'November' , 'Super 100' ],
[ 'Spain Masters' , 'March' , 'Super 300' ]]
columns = [ 'Tournament' , 'Month' , 'Level' ]
dataframe = random_value_session.createDataFrame(rows,
columns)
dataframe.show()
row_list = dataframe.collect()
print (row_list[ 0 ])
print (row_list[ 0 ].__getitem__( 'Level' ))
print (row_list[ 0 ].__getitem__( 'Tournament' ))
print (row_list[ 0 ].__getitem__( 'Level' ))
print (row_list[ 0 ].__getitem__( 'Month' ))
|
Output:
+----------------+--------+----------+
| Tournament| Month| Level|
+----------------+--------+----------+
|All England Open| March|Super 1000|
| Malaysia Open| January| Super 750|
| Korea Open| April| Super 500|
| Hylo Open|November| Super 100|
| Spain Masters| March| Super 300|
+----------------+--------+----------+
Row(Tournament='All England Open', Month='March', Level='Super 1000')
Super 1000
All England Open
Super 1000
March
Method 2 : Using asDict() method
We will create a Spark DataFrame with atleast one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the asDict() method to get a dictionary where column names are keys and their row values are dictionary values. Given below is the syntax:
Syntax : DataFrame.asDict(recursive)
Parameters :
recursive: bool : returns nested rows as dict. The default value is False.
We then get easily get the value from the dictionary using DictionaryName[‘key_name’].
Python
import pyspark
from pyspark.sql import SparkSession
random_value_session = SparkSession.builder.appName(
'Random_Value_Session'
).getOrCreate()
rows = [[ 'French Open' , 'October' , 'Super 750' ],
[ 'Macau Open' , 'November' , 'Super 300' ],
[ 'India Open' , 'January' , 'Super 500' ],
[ 'Odisha Open' , 'January' , 'Super 100' ],
[ 'China Open' , 'November' , 'Super 1000' ]]
columns = [ 'Tournament' , 'Month' , 'Level' ]
dataframe = random_value_session.createDataFrame(rows,
columns)
dataframe.show()
row_list = dataframe.collect()
print (row_list[ 1 ])
print ()
print (row_list[ 1 ].asDict())
print ()
print (row_list[ 1 ].asDict()[ 'Tournament' ])
print (row_list[ 1 ].asDict()[ 'Month' ])
print (row_list[ 1 ].asDict()[ 'Level' ])
|
Output :
+-----------+--------+----------+
| Tournament| Month| Level|
+-----------+--------+----------+
|French Open| October| Super 750|
| Macau Open|November| Super 300|
| India Open| January| Super 500|
|Odisha Open| January| Super 100|
| China Open|November|Super 1000|
+-----------+--------+----------+
Row(Tournament='Macau Open', Month='November', Level='Super 300')
{'Tournament': 'Macau Open', 'Month': 'November', 'Level': 'Super 300'}
Macau Open
November
Super 300
Method 3: Imagining Row object just like a list
Here we will imagine a Row object like a Python List and perform operations. We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). Since we are imagining the Row object like a List, we just use :
Syntax : RowObject[‘Column_name’]
Returns : Value corresponding to the column name in the row object.
Python
import pyspark
from pyspark.sql import SparkSession
random_value_session = SparkSession.builder.appName(
'Random_Value_Session'
).getOrCreate()
rows = [[ 'Denmark Open' , 'October' , 'Super 1000' ],
[ 'Indonesia Open' , 'June' , 'Super 1000' ],
[ 'Korea Open' , 'April' , 'Super 500' ],
[ 'Japan Open' , 'August' , 'Super 750' ],
[ 'Akita Masters' , 'July' , 'Super 100' ]]
columns = [ 'Tournament' , 'Month' , 'Level' ]
dataframe = random_value_session.createDataFrame(rows,
columns)
dataframe.show()
row_list = dataframe.collect()
row_object = row_list[ 2 ]
print (row_object[ 0 ])
print (row_list[ 4 ][ 0 ])
print (row_list[ 3 ][ 1 ])
print (row_list[ 4 ][ 2 ])
|
Output:
+--------------+-------+----------+
| Tournament| Month| Level|
+--------------+-------+----------+
| Denmark Open|October|Super 1000|
|Indonesia Open| June|Super 1000|
| Korea Open| April| Super 500|
| Japan Open| August| Super 750|
| Akita Masters| July| Super 100|
+--------------+-------+----------+
Korea Open
Akita Masters
August
Super 100
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...