Converting Row into list RDD in PySpark
In this article, we are going to convert Row into a list RDD in Pyspark.
Creating RDD from Row for demonstration:
Python3
from pyspark.sql import SparkSession, Row
spark = SparkSession.builder.appName( 'SparkByExamples.com' ).getOrCreate()
data = [Row(name = "sravan kumar" ,
subjects = [ "Java" , "python" , "C++" ],
state = "AP" ),
Row(name = "Ojaswi" ,
lang = [ "Spark" , "Java" , "C++" ],
state = "Telangana" ),
Row(name = "rohith" ,
subjects = [ "DS" , "PHP" , ".net" ],
state = "AP" ),
Row(name = "bobby" ,
lang = [ "Python" , "C" , "sql" ],
state = "Delhi" ),
Row(name = "rohith" ,
lang = [ "CSharp" , "VB" ],
state = "Telangana" )]
rdd = spark.sparkContext.parallelize(data)
rdd.collect()
|
Output:
[Row(name='sravan kumar', subjects=['Java', 'python', 'C++'], state='AP'),
Row(name='Ojaswi', lang=['Spark', 'Java', 'C++'], state='Telangana'),
Row(name='rohith', subjects=['DS', 'PHP', '.net'], state='AP'),
Row(name='bobby', lang=['Python', 'C', 'sql'], state='Delhi'),
Row(name='rohith', lang=['CSharp', 'VB'], state='Telangana')]
Using map() function we can convert into list RDD
Syntax: rdd_data.map(list)
where, rdd_data is the data is of type rdd.
Finally, by using the collect method we can display the data in the list RDD.
Python3
b = rdd. map ( list )
for i in b.collect():
print (i)
|
Output:
['sravan kumar', ['Java', 'python', 'C++'], 'AP']
['Ojaswi', ['Spark', 'Java', 'C++'], 'Telangana']
['rohith', ['DS', 'PHP', '.net'], 'AP']
['bobby', ['Python', 'C', 'sql'], 'Delhi']
['rohith', ['CSharp', 'VB'], 'Telangana']
Last Updated :
18 Jul, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...