Open In App

How to parse nested JSON using Scala Spark?

Last Updated : 24 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn how to parse nested JSON using Scala Spark.

To parse nested JSON using Scala Spark, you can follow these steps:

  1. Define the schema for your JSON data.
  2. Read the JSON data into a Datc aFrame.
  3. Select and manipulate the DataFrame columns to work with the nested structure.

Scala Spark Program to parse nested JSON:

Scala
//Scala Spark Program to parse nested JSON :
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions

// Step 1: Define the schema
val schema = """
  {
    "type": "struct",
    "fields": [
      {"name": "id", "type": "integer", "nullable": false},
      {"name": "name", "type": "string", "nullable": true},
      {"name": "details",
       "type": {
         "type": "struct",
         "fields": [
           {"name": "age", "type": "integer", "nullable": true},
           {"name": "city", "type": "string", "nullable": true}
         ]
       },
       "nullable": true
      }
    ]
  }
"""

// Step 2: Create SparkSession
val spark = SparkSession.builder()
  .appName("Nested JSON Parsing")
  .master("local[*]")
  .getOrCreate()

// Step 3: Read JSON data into DataFrame
val df = spark.read.schema(schema).json("path_to_your_json_file")

// Step 4: Select and manipulate DataFrame columns
val parsedDF = df.select(
  col("id"),
  col("name"),
  col("details.age").as("age"),
  col("details.city").as("city")
)

// Step 5: Show the result
parsedDF.show()

Output:

+---+------+---+-----+
| id| name|age| city|
+---+------+---+-----+
| 1| Alice| 30|Paris|
| 2| Bob| 25|New York|
| 3| Carol| 35|London|

Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads