In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame.
It can be done in these ways:
- Using Infer schema.
- Using Explicit schema
- Using SQL Expression
Method 1: Infer schema from the dictionary
We will pass the dictionary directly to the createDataFrame() method.
Syntax: spark.createDataFrame(data)
Example: Python code to create pyspark dataframe from dictionary list using this method
# import the modules from pyspark.sql import SparkSession
# Create Spark session app name # is GFG and master name is local spark = SparkSession.builder.appName( "GFG" ).master( "local" ) .getOrCreate()
# dictionary list of college data data = [{ "Name" : 'sravan kumar' ,
"ID" : 1 ,
"Percentage" : 94.29 },
{ "Name" : 'sravani' ,
"ID" : 2 ,
"Percentage" : 84.29 },
{ "Name" : 'kumar' ,
"ID" : 3 ,
"Percentage" : 94.29 }
]
# Create data frame from dictionary list df = spark.createDataFrame(data)
# display df.show() |
Output:
Method 2: Using Explicit schema
Here we are going to create a schema and pass the schema along with the data to createdataframe() method.
Schema structure:
schema = StructType([
StructField(‘column_1’, DataType(), False),
StructField(‘column_2’, DataType(), False)])
Where columns are the name of the columns of the dictionary to get in pyspark dataframe and Datatype is the data type of the particular column.
Syntax: spark.createDataFrame(data, schema)
Where,
- data is the dictionary list
- schema is the schema of the dataframe
Python program to create pyspark dataframe from dictionary lists using this method.
# import the modules from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType,
StringType, IntegerType, FloatType # Create Spark session app name is # GFG and master name is local spark = SparkSession.builder.appName( "GFG" ).master( "local" ) .getOrCreate()
# dictionary list of college data data = [{ "Name" : 'sravan kumar' ,
"ID" : 1 ,
"Percentage" : 94.29 },
{ "Name" : 'sravani' ,
"ID" : 2 ,
"Percentage" : 84.29 },
{ "Name" : 'kumar' ,
"ID" : 3 ,
"Percentage" : 94.29 }
]
# specify the schema schema = StructType([
StructField( 'Name' , StringType(), False ),
StructField( 'ID' , IntegerType(), False ),
StructField( 'Percentage' , FloatType(), True )
]) # Create data frame from # dictionary list through the schema df = spark.createDataFrame(data, schema)
# display df.show() |
Output:
Method 3: Using SQL Expression
Here we are using the Row function to convert the python dictionary list to pyspark dataframe.
Syntax: spark.createDataFrame([Row(**iterator) for iterator in data])
where:
- createDataFrame() is the method to create the dataframe
- Row(**iterator) to iterate the dictionary list.
- data is the dictionary list
Python code to convert dictionary list to pyspark dataframe.
# import the modules from pyspark.sql import SparkSession, Row
# Create Spark session app name # is GFG and master name is local spark = SparkSession.builder.appName( "GFG" ).master( "local" ) .getOrCreate()
# dictionary list of college data data = [{ "Name" : 'sravan kumar' ,
"ID" : 1 ,
"Percentage" : 94.29 },
{ "Name" : 'sravani' ,
"ID" : 2 ,
"Percentage" : 84.29 },
{ "Name" : 'kumar' ,
"ID" : 3 ,
"Percentage" : 94.29 }
]
# create dataframe using sql expression dataframe = spark.createDataFrame([Row( * * variable)
for variable in data])
dataframe.show() |
Output: