How to parse nested JSON using Scala Spark?
Last Updated :
24 Apr, 2024
In this article, we will learn how to parse nested JSON using Scala Spark.
To parse nested JSON using Scala Spark, you can follow these steps:
- Define the schema for your JSON data.
- Read the JSON data into a Datc aFrame.
- Select and manipulate the DataFrame columns to work with the nested structure.
Scala Spark Program to parse nested JSON:
Scala
//Scala Spark Program to parse nested JSON :
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions
// Step 1: Define the schema
val schema = """
{
"type": "struct",
"fields": [
{"name": "id", "type": "integer", "nullable": false},
{"name": "name", "type": "string", "nullable": true},
{"name": "details",
"type": {
"type": "struct",
"fields": [
{"name": "age", "type": "integer", "nullable": true},
{"name": "city", "type": "string", "nullable": true}
]
},
"nullable": true
}
]
}
"""
// Step 2: Create SparkSession
val spark = SparkSession.builder()
.appName("Nested JSON Parsing")
.master("local[*]")
.getOrCreate()
// Step 3: Read JSON data into DataFrame
val df = spark.read.schema(schema).json("path_to_your_json_file")
// Step 4: Select and manipulate DataFrame columns
val parsedDF = df.select(
col("id"),
col("name"),
col("details.age").as("age"),
col("details.city").as("city")
)
// Step 5: Show the result
parsedDF.show()
Output:
+---+------+---+-----+
| id| name|age| city|
+---+------+---+-----+
| 1| Alice| 30|Paris|
| 2| Bob| 25|New York|
| 3| Carol| 35|London|
Share your thoughts in the comments
Please Login to comment...