Convert Python Dictionary List to PySpark DataFrame

In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame.

It can be done in these ways:

Using Infer schema.
Using Explicit schema
Using SQL Expression

Method 1: Infer schema from the dictionary

We will pass the dictionary directly to the createDataFrame() method.

Syntax: spark.createDataFrame(data)

Example: Python code to create pyspark dataframe from dictionary list using this method

Python3

# import the modules 

from pyspark.sql import SparkSession 

# Create Spark session app name 
# is GFG and master name is local 

spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate() 

# dictionary list of college data 

data = [{"Name": 'sravan kumar', 

         "ID": 1, 

         "Percentage": 94.29}, 

        {"Name": 'sravani', 

         "ID": 2, 

         "Percentage": 84.29}, 

        {"Name": 'kumar', 

         "ID": 3, 

         "Percentage": 94.29} 

        ] 

# Create data frame from dictionary list 

df = spark.createDataFrame(data) 

# display 
df.show()

Output:

Method 2: Using Explicit schema

Here we are going to create a schema and pass the schema along with the data to createdataframe() method.

Schema structure:

schema = StructType([

StructField(‘column_1’, DataType(), False),

StructField(‘column_2’, DataType(), False)])

Where columns are the name of the columns of the dictionary to get in pyspark dataframe and Datatype is the data type of the particular column.

Syntax: spark.createDataFrame(data, schema)

Where,

data is the dictionary list

schema is the schema of the dataframe

Python program to create pyspark dataframe from dictionary lists using this method.

Python3

# import the modules 

from pyspark.sql import SparkSession 

from pyspark.sql.types import StructField, StructType, 
StringType, IntegerType, FloatType 

# Create Spark session app name is 
# GFG and master name is local 

spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate() 

# dictionary list of college data 

data = [{"Name": 'sravan kumar', 

         "ID": 1, 

         "Percentage": 94.29}, 

        {"Name": 'sravani', 

         "ID": 2, 

         "Percentage": 84.29}, 

        {"Name": 'kumar', 

         "ID": 3, 

         "Percentage": 94.29} 

        ] 

# specify the schema 

schema = StructType([ 

    StructField('Name', StringType(), False), 

    StructField('ID', IntegerType(), False), 

    StructField('Percentage', FloatType(), True) 
]) 

# Create data frame from 
# dictionary list through the schema 

df = spark.createDataFrame(data, schema) 

# display 
df.show()

Output:

Method 3: Using SQL Expression

Here we are using the Row function to convert the python dictionary list to pyspark dataframe.

Syntax: spark.createDataFrame([Row(**iterator) for iterator in data])

where:

createDataFrame() is the method to create the dataframe

Row(**iterator) to iterate the dictionary list.

data is the dictionary list

Python code to convert dictionary list to pyspark dataframe.

Python3

# import the modules 

from pyspark.sql import SparkSession, Row 

# Create Spark session app name 
# is GFG and master name is local 

spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate() 

# dictionary list of college data 

data = [{"Name": 'sravan kumar', 

         "ID": 1, 

         "Percentage": 94.29}, 

        {"Name": 'sravani', 

         "ID": 2, 

         "Percentage": 84.29}, 

        {"Name": 'kumar', 

         "ID": 3, 

         "Percentage": 94.29} 

        ] 

# create dataframe using sql expression 

dataframe = spark.createDataFrame([Row(**variable)  

                                   for variable in data]) 

dataframe.show()

Output:

Article Tags :

Python

Python-Pyspark