In this article, we are going to convert Row into a list RDD in Pyspark.
Creating RDD from Row for demonstration:
Python3
# import Row and SparkSession from pyspark.sql import SparkSession, Row
# create sparksession spark = SparkSession.builder.appName( 'SparkByExamples.com' ).getOrCreate()
# create student data with Row function data = [Row(name = "sravan kumar" ,
subjects = [ "Java" , "python" , "C++" ],
state = "AP" ),
Row(name = "Ojaswi" ,
lang = [ "Spark" , "Java" , "C++" ],
state = "Telangana" ),
Row(name = "rohith" ,
subjects = [ "DS" , "PHP" , ".net" ],
state = "AP" ),
Row(name = "bobby" ,
lang = [ "Python" , "C" , "sql" ],
state = "Delhi" ),
Row(name = "rohith" ,
lang = [ "CSharp" , "VB" ],
state = "Telangana" )]
rdd = spark.sparkContext.parallelize(data)
# display actual rdd rdd.collect() |
Output:
[Row(name='sravan kumar', subjects=['Java', 'python', 'C++'], state='AP'), Row(name='Ojaswi', lang=['Spark', 'Java', 'C++'], state='Telangana'), Row(name='rohith', subjects=['DS', 'PHP', '.net'], state='AP'), Row(name='bobby', lang=['Python', 'C', 'sql'], state='Delhi'), Row(name='rohith', lang=['CSharp', 'VB'], state='Telangana')]
Using map() function we can convert into list RDD
Syntax: rdd_data.map(list)
where, rdd_data is the data is of type rdd.
Finally, by using the collect method we can display the data in the list RDD.
Python3
# convert rdd to list by using map() method b = rdd. map ( list )
# display the data in b with collect method for i in b.collect():
print (i)
|
Output:
['sravan kumar', ['Java', 'python', 'C++'], 'AP'] ['Ojaswi', ['Spark', 'Java', 'C++'], 'Telangana'] ['rohith', ['DS', 'PHP', '.net'], 'AP'] ['bobby', ['Python', 'C', 'sql'], 'Delhi'] ['rohith', ['CSharp', 'VB'], 'Telangana']
Article Tags :