Skip to content
Related Articles

Related Articles

Improve Article

Convert Python Dictionary List to PySpark DataFrame

  • Last Updated : 18 Jul, 2021

In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame.

It can be done in these ways:

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

  • Using Infer schema.
  • Using Explicit schema
  • Using SQL Expression

Method 1: Infer schema from the dictionary



We will pass the dictionary directly to the createDataFrame() method.

Syntax: spark.createDataFrame(data)

Example: Python code to create pyspark dataframe from dictionary list using this method

Python3




# import the modules
from pyspark.sql import SparkSession
  
# Create Spark session app name
# is GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# Create data frame from dictionary list
df = spark.createDataFrame(data)
  
# display
df.show()

Output:

Method 2: Using Explicit schema

Here we are going to create a schema and pass the schema along with the data to createdataframe() method.



Schema structure:

schema = StructType([

   StructField(‘column_1’, DataType(), False),

   StructField(‘column_2’, DataType(), False)])

Where columns are the name of the columns of the dictionary to get in pyspark dataframe and Datatype is the data type of the particular column.

Syntax: spark.createDataFrame(data, schema)

Where, 

  • data is the dictionary list
  • schema is the schema of the dataframe

Python program to create pyspark dataframe from dictionary lists using this method.

Python3




# import the modules
from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType,
StringType, IntegerType, FloatType
  
  
# Create Spark session app name is
# GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# specify the schema
schema = StructType([
    StructField('Name', StringType(), False),
    StructField('ID', IntegerType(), False),
    StructField('Percentage', FloatType(), True)
])
  
# Create data frame from
# dictionary list through the schema
df = spark.createDataFrame(data, schema)
  
# display
df.show()

Output:



Method 3: Using SQL Expression

Here we are using the Row function to convert the python dictionary list to pyspark dataframe.

Syntax: spark.createDataFrame([Row(**iterator) for iterator in data])

where: 

  • createDataFrame() is the method to create the dataframe
  • Row(**iterator) to iterate the dictionary list.
  • data is the dictionary list

Python code to convert dictionary list to pyspark dataframe.

Python3




# import the modules
from pyspark.sql import SparkSession, Row
  
# Create Spark session app name
# is GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# create dataframe using sql expression
dataframe = spark.createDataFrame([Row(**variable) 
                                   for variable in data])
  
dataframe.show()

Output:




My Personal Notes arrow_drop_up
Recommended Articles
Page :