How to get a value from the Row object in PySpark Dataframe?
In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame.
Method 1 : Using __getitem()__ magic method
We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the __getitem()__ magic method to get an item of a particular column name. Given below is the syntax.
Syntax : DataFrame.__getitem__(‘Column_Name’)
Returns : value corresponding to the column name in the Row object
Python
# library import import pyspark from pyspark.sql import SparkSession from pyspark.sql import Row # Session Creation random_value_session = SparkSession.builder.appName( 'Random_Value_Session' ).getOrCreate() # Data filled in our DataFrame # 5 rows below rows = [[ 'All England Open' , 'March' , 'Super 1000' ], [ 'Malaysia Open' , 'January' , 'Super 750' ], [ 'Korea Open' , 'April' , 'Super 500' ], [ 'Hylo Open' , 'November' , 'Super 100' ], [ 'Spain Masters' , 'March' , 'Super 300' ]] # Columns of our DataFrame columns = [ 'Tournament' , 'Month' , 'Level' ] #DataFrame is created dataframe = random_value_session.createDataFrame(rows, columns) # Showing the DataFrame dataframe.show() # getting list of rows using collect() row_list = dataframe.collect() # Printing the first Row object # from which data is extracted print (row_list[ 0 ]) # Using __getitem__() magic method # To get value corresponding to a particular # column print (row_list[ 0 ].__getitem__( 'Level' )) print (row_list[ 0 ].__getitem__( 'Tournament' )) print (row_list[ 0 ].__getitem__( 'Level' )) print (row_list[ 0 ].__getitem__( 'Month' )) |
Output:
+----------------+--------+----------+ | Tournament| Month| Level| +----------------+--------+----------+ |All England Open| March|Super 1000| | Malaysia Open| January| Super 750| | Korea Open| April| Super 500| | Hylo Open|November| Super 100| | Spain Masters| March| Super 300| +----------------+--------+----------+ Row(Tournament='All England Open', Month='March', Level='Super 1000') Super 1000 All England Open Super 1000 March
Method 2 : Using asDict() method
We will create a Spark DataFrame with atleast one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the asDict() method to get a dictionary where column names are keys and their row values are dictionary values. Given below is the syntax:
Syntax : DataFrame.asDict(recursive)
Parameters :
recursive: bool : returns nested rows as dict. The default value is False.
We then get easily get the value from the dictionary using DictionaryName[‘key_name’].
Python
# library imports are done here import pyspark from pyspark.sql import SparkSession # Session Creation random_value_session = SparkSession.builder.appName( 'Random_Value_Session' ).getOrCreate() # Data filled in our DataFrame # Rows below will be filled rows = [[ 'French Open' , 'October' , 'Super 750' ], [ 'Macau Open' , 'November' , 'Super 300' ], [ 'India Open' , 'January' , 'Super 500' ], [ 'Odisha Open' , 'January' , 'Super 100' ], [ 'China Open' , 'November' , 'Super 1000' ]] # DataFrame Columns columns = [ 'Tournament' , 'Month' , 'Level' ] # DataFrame creation dataframe = random_value_session.createDataFrame(rows, columns) # DataFrame print dataframe.show() # list of rows using collect() row_list = dataframe.collect() # Printing the second Row object # from which we will read data print (row_list[ 1 ]) print () # Printing dictionary to make # things more clear print (row_list[ 1 ].asDict()) print () # Using asDict() method to convert row object # into a dictionary where the column names are keys # Using column names as keys to get respective values print (row_list[ 1 ].asDict()[ 'Tournament' ]) print (row_list[ 1 ].asDict()[ 'Month' ]) print (row_list[ 1 ].asDict()[ 'Level' ]) |
Output :
+-----------+--------+----------+ | Tournament| Month| Level| +-----------+--------+----------+ |French Open| October| Super 750| | Macau Open|November| Super 300| | India Open| January| Super 500| |Odisha Open| January| Super 100| | China Open|November|Super 1000| +-----------+--------+----------+ Row(Tournament='Macau Open', Month='November', Level='Super 300') {'Tournament': 'Macau Open', 'Month': 'November', 'Level': 'Super 300'} Macau Open November Super 300
Method 3: Imagining Row object just like a list
Here we will imagine a Row object like a Python List and perform operations. We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). Since we are imagining the Row object like a List, we just use :
Syntax : RowObject[‘Column_name’]
Returns : Value corresponding to the column name in the row object.
Python
# library imports are done here import pyspark from pyspark.sql import SparkSession # Session Creation random_value_session = SparkSession.builder.appName( 'Random_Value_Session' ).getOrCreate() # Data filled in our DataFrame # Rows below will be filled rows = [[ 'Denmark Open' , 'October' , 'Super 1000' ], [ 'Indonesia Open' , 'June' , 'Super 1000' ], [ 'Korea Open' , 'April' , 'Super 500' ], [ 'Japan Open' , 'August' , 'Super 750' ], [ 'Akita Masters' , 'July' , 'Super 100' ]] # DataFrame Columns columns = [ 'Tournament' , 'Month' , 'Level' ] # DataFrame creation dataframe = random_value_session.createDataFrame(rows, columns) # DataFrame print dataframe.show() # list of rows using collect() row_list = dataframe.collect() # Lets take the third Row object row_object = row_list[ 2 ] # If we imagine it as a Python List, # We can get the first value of the list, # index 0, let's try it print (row_object[ 0 ]) # We got the value of column at index 0 # which is - 'Tournament' # A few more examples print (row_list[ 4 ][ 0 ]) print (row_list[ 3 ][ 1 ]) print (row_list[ 4 ][ 2 ]) |
Output:
+--------------+-------+----------+ | Tournament| Month| Level| +--------------+-------+----------+ | Denmark Open|October|Super 1000| |Indonesia Open| June|Super 1000| | Korea Open| April| Super 500| | Japan Open| August| Super 750| | Akita Masters| July| Super 100| +--------------+-------+----------+ Korea Open Akita Masters August Super 100
Please Login to comment...