Open In App

Create a JSON structure in Pyspark

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to learn how to create a JSON structure using Pyspark in Python.

An influential and renowned means for dealing with massive amounts of information, Pyspark is an interface for Apache Spark in Python. Pyspark is a distributed processing system produced for managing large datasets which not just allows us to create Spark applications using Python, but also provides the Pyspark shell for interactively inspecting our data in a distributed environment. We’ll take a look at how to employ Pyspark to construct a JSON structure in this article.

In order to build a JSON structure in Pyspark, a Pyspark data frame must be converted into a JSON string. Numerous in-built modules and functions in Pyspark can make this transformation straightforward. The following Pyspark components and procedures will be engaged in the article:

  • Pyspark.sql.functions: furnishes pre-assembled procedures for connecting with Pyspark DataFrames.
  • Pyspark.sql.types: provides data types for defining Pyspark DataFrame schema.

Example 1: Creating a JSON structure from a Pyspark DataFrame

In this example, we will create a Pyspark DataFrame and convert it to a JSON string. Firstly import all required modules and then create a spark session. Construct a Pyspark data frame schema using StructField() and then create a data frame using the creaDataFrame() function. Transform data frame to JSON object using toJSON() function and print that JSON file. We have saved this JSON file in “example1.json” file using file handling in Python.

Python3




from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import SparkSession
  
# Create a SparkSession
spark = SparkSession.builder.appName("JSON Creation").getOrCreate()
  
# Define the PySpark DataFrame schema
schema = StructType([
    StructField("name", StringType()),
    StructField("age", IntegerType()),
    StructField("city", StringType())
])
  
# Create a PySpark DataFrame
data = [("Shyam", 25, "New York"),
        ("Ram", 30, "San Francisco")]
df = spark.createDataFrame(data, schema)
  
# Convert the PySpark DataFrame to a JSON string
json_string = df.toJSON().collect()[0]
  
print(json_string)
  
# Write the JSON string to file
with open("example1.json", "w") as f:
    f.write(json_string)


Output:

{"name":"Shyam","age":25,"city":"New York"}

Example 2: Transforming a Pyspark DataFrame with an array into a JSON format.

To transform a Pyspark data frame with an array into a JSON format we follow the same procedure as in the previous example and construct a Pyspark data frame with an array field and create a JSON string and then stored it in a JSON file.

Python3




from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import SparkSession
  
# Create a SparkSession
spark = SparkSession.builder.appName("JSON Creation").getOrCreate()
  
# Define the PySpark DataFrame schema with an array field
schema = StructType([
    StructField("name",
                StringType()),
    StructField("scores",
                ArrayType(IntegerType()))
])
  
# Create a PySpark DataFrame with an array field
data = [("Ali", [80, 90, 95]),
        ("Bhim", [70, 85, 92])]
df = spark.createDataFrame(data, schema)
  
# Convert the PySpark DataFrame to a JSON string
json_string = df.toJSON().collect()[0]
  
print(json_string)
  
# Write the JSON string to a file
with open("example2.json", "w") as f:
    f.write(json_string)


Output:

{"name":"Ali","scores":[80,90,95]}

The Pyspark data frame design in the above situation has two components: name and scores. The scores segment comprises a selection of integers. Afterward, we created a Pyspark data frame with two rows of information, each line including an array of scores for the scores field. The Pyspark data frame was then rearranged into a JSON string utilizing the toJSON() work, and the produced JSON string was stored in a document named example2.json



Last Updated : 16 Mar, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads