Skip to content
Related Articles

Related Articles

Improve Article

Create PySpark dataframe from nested dictionary

  • Last Updated : 17 Jun, 2021

In this article, we are going to discuss the creation of Pyspark dataframe from the nested dictionary. 

We will use the createDataFrame() method from pyspark for creating DataFrame. For this, we will use a list of nested dictionary and extract the pair as a key and value. Select the key, value pairs by mentioning the items() function from the nested dictionary

[Row(**{'': k, **v}) for k,v in data.items()]

Example 1:Python program to create college data with a dictionary with nested address in dictionary

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
from pyspark.sql import Row
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# creating nested dictionary
data = {
    'student_1': {
        'student id': 7058,
        'country': 'India',
        'state': 'AP',
        'district': 'Guntur'
    },
    'student_2': {
        'student id': 7059,
        'country': 'Srilanka',
        'state': 'X',
        'district': 'Y'
    }
}
  
# taking row data
rowdata = [Row(**{'': k, **v}) for k,
           v in data.items()]
  
# creating the pyspark dataframe
final = spark.createDataFrame(rowdata).select(
  'student id', 'country', 'state', 'district')
  
# display pyspark dataframe
final.show()

Output:



+----------+--------+-----+--------+
|student id| country|state|district|
+----------+--------+-----+--------+
|      7058|   India|   AP|  Guntur|
|      7059|Srilanka|    X|       Y|
+----------+--------+-----+--------+

Example 2: Python program to create nested dictionaries with 3 columns(3 keys)

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
from pyspark.sql import Row
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# creating nested dictionary
data = {
    'student_1': {
        'student id': 7058,
        'country': 'India',
        'state': 'AP'
    },
    'student_2': {
        'student id': 7059,
        'country': 'Srilanka',
        'state': 'X'
  
    }
}
  
# taking row data
rowdata = [Row(**{'': k, **v}) for k, v in data.items()]
  
# creating the pyspark dataframe
final = spark.createDataFrame(rowdata).select(
  'student id', 'country', 'state')
  
# display pyspark dataframe
final.show()

Output:

+----------+--------+-----+
|student id| country|state|
+----------+--------+-----+
|      7058|   India|   AP|
|      7059|Srilanka|    X|
+----------+--------+-----+

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :