How to create a PySpark dataframe from multiple lists ?
Last Updated :
30 May, 2021
In this article, we will discuss how to create Pyspark dataframe from multiple lists.
Approach
- Create data from multiple lists and give column names in another list. So, to do our task we will use the zip method.
zip(list1,list2,., list n)
- Pass this zipped data to spark.createDataFrame() method
dataframe = spark.createDataFrame(data, columns)
Examples
Example 1: Python program to create two lists and create the dataframe using these two lists
Python3
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
data = [ 1 , 2 , 3 ]
data1 = [ "sravan" , "bobby" , "ojaswi" ]
columns = [ 'ID' , 'NAME' ]
dataframe = spark.createDataFrame( zip (data, data1), columns)
dataframe.show()
|
Output:
Example 2: Python program to create 4 lists and create the dataframe
Python3
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
data = [ 1 , 2 , 3 ]
data1 = [ "sravan" , "bobby" , "ojaswi" ]
data2 = [ "iit-k" , "iit-mumbai" , "vignan university" ]
data3 = [ "AP" , "TS" , "UP" ]
columns = [ 'ID' , 'NAME' , 'COLLEGE' , 'ADDRESS' ]
dataframe = spark.createDataFrame(
zip (data, data1, data2, data3), columns)
dataframe.show()
|
Output:
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...