Convert Python Dictionary List to PySpark DataFrame
In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame.
It can be done in these ways:
- Using Infer schema.
- Using Explicit schema
- Using SQL Expression
Method 1: Infer schema from the dictionary
We will pass the dictionary directly to the createDataFrame() method.
Syntax: spark.createDataFrame(data)
Example: Python code to create pyspark dataframe from dictionary list using this method
Python3
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( "GFG" ).master( "local" ) .getOrCreate()
data = [{ "Name" : 'sravan kumar' ,
"ID" : 1 ,
"Percentage" : 94.29 },
{ "Name" : 'sravani' ,
"ID" : 2 ,
"Percentage" : 84.29 },
{ "Name" : 'kumar' ,
"ID" : 3 ,
"Percentage" : 94.29 }
]
df = spark.createDataFrame(data)
df.show()
|
Output:
Method 2: Using Explicit schema
Here we are going to create a schema and pass the schema along with the data to createdataframe() method.
Schema structure:
schema = StructType([
StructField(‘column_1’, DataType(), False),
StructField(‘column_2’, DataType(), False)])
Where columns are the name of the columns of the dictionary to get in pyspark dataframe and Datatype is the data type of the particular column.
Syntax: spark.createDataFrame(data, schema)
Where,
- data is the dictionary list
- schema is the schema of the dataframe
Python program to create pyspark dataframe from dictionary lists using this method.
Python3
from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType,
StringType, IntegerType, FloatType
spark = SparkSession.builder.appName( "GFG" ).master( "local" ) .getOrCreate()
data = [{ "Name" : 'sravan kumar' ,
"ID" : 1 ,
"Percentage" : 94.29 },
{ "Name" : 'sravani' ,
"ID" : 2 ,
"Percentage" : 84.29 },
{ "Name" : 'kumar' ,
"ID" : 3 ,
"Percentage" : 94.29 }
]
schema = StructType([
StructField( 'Name' , StringType(), False ),
StructField( 'ID' , IntegerType(), False ),
StructField( 'Percentage' , FloatType(), True )
])
df = spark.createDataFrame(data, schema)
df.show()
|
Output:
Method 3: Using SQL Expression
Here we are using the Row function to convert the python dictionary list to pyspark dataframe.
Syntax: spark.createDataFrame([Row(**iterator) for iterator in data])
where:
- createDataFrame() is the method to create the dataframe
- Row(**iterator) to iterate the dictionary list.
- data is the dictionary list
Python code to convert dictionary list to pyspark dataframe.
Python3
from pyspark.sql import SparkSession, Row
spark = SparkSession.builder.appName( "GFG" ).master( "local" ) .getOrCreate()
data = [{ "Name" : 'sravan kumar' ,
"ID" : 1 ,
"Percentage" : 94.29 },
{ "Name" : 'sravani' ,
"ID" : 2 ,
"Percentage" : 84.29 },
{ "Name" : 'kumar' ,
"ID" : 3 ,
"Percentage" : 94.29 }
]
dataframe = spark.createDataFrame([Row( * * variable)
for variable in data])
dataframe.show()
|
Output:
Last Updated :
18 Jul, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...