Python Pandas – Flatten nested JSON
Last Updated :
05 Dec, 2023
It is general practice to convert the JSON data structure to a Pandas Dataframe as it can help to manipulate and visualize the data more conveniently. In this article, let us consider different nested JSON data structures and flatten them using inbuilt and custom-defined functions.
Python Pandas.json_normalize() Syntax
Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables.
Syntax: pandas.json_normalize(data, errors=’raise’, sep=’.’, max_level=None)
Parameters:
- data – dict or list of dicts
- errors – {‘raise’, ‘ignore’}, default ‘raise’
- sep – str, default ‘.’ nested records will generate names separated by a specified separator.
- max_level – int, default None. Max number of levels(depth of dict) to normalize.
Python Pandas Flatten Nested JSON Examples
Most of the data extracted from the web through scraping are in the form of JSON datatype because JSON is a preferred datatype for transmitting data in web applications. The reason JSON is preferred is that it’s extremely lightweight to send back and forth in HTTP requests and responses due to the small file size. Below are the examples by which we can flatten nested json in Python:
Example 1: Pandas json_normalize
Function
Consider a list of nested dictionaries that contains details about the students and their marks as shown. Use pandas json_normalize on this JSON data structure to flatten it to a flat table as shown
Python3
import pandas as pd
data = [
{ "Roll no" : 1 ,
"student" : { "first_name" : "Ram" , "last_name" : "kumar" }
},
{ "student" : { "English" : "95" , "Math" : "88" }
},
{ "Roll no" : 2 ,
"student" : { "first_name" : "Joseph" , "English" : "90" , "Science" : "82" }
},
{ "Roll no" : 3 ,
"student" : { "first_name" : "abinaya" , "last_name" : "devi" },
"student" : { "English" : "91" , "Math" : "98" }
},
]
df = pd.json_normalize(data)
print (df)
|
Output:
Roll no student.first_name student.last_name student.English student.Math student.Science
0 1 Ram kumar NaN NaN NaN
1 NaN NaN NaN 95 88 NaN
2 2 Joseph NaN 90 NaN 82
3 3 abinaya devi 91 98 NaN
Example 2: Flatten Nested Json in Pandas using max_level=0
Now let us make use of the max_level option to flatten a slightly complicated JSON structure to a flat table. For this example, we have considered the max_level of 0, which means flattening only the first level of JSON and can experiment with the results.
Here, we have considered an example of the health records of different individuals in JSON format.
Python3
import pandas as pd
data = [
{
"id" : 1 ,
"candidate" : "Roberto mathews" ,
"health_index" : { "bmi" : 22 , "blood_pressure" : 130 },
},
{ "candidate" : "Shane wade" , "health_index" : { "bmi" : 28 , "blood_pressure" : 160 }},
{
"id" : 2 ,
"candidate" : "Bruce tommy" ,
"health_index" : { "bmi" : 31 , "blood_pressure" : 190 },
},
]
pd.json_normalize(data, max_level = 0 )
|
Output:
As we have used only 1 level of flattening, The second level is retained as a key-value pair as shown
id candidate health_index.bmi health_index.blood_pressure
0 1 Roberto mathews 22 130
1 NaN Shane wade 28 160
2 2 Bruce tommy 31 190
Example 3: Flatten Nested Json in Pandas using max_level=0
Now let us use the same JSON data structure as above, with max_level of 1, which means flattening the first two levels of JSON and can experiment with the results.
Python3
import pandas as pd
data = [
{
"id" : 1 ,
"candidate" : "Roberto mathews" ,
"health_index" : { "bmi" : 22 , "blood_pressure" : 130 },
},
{ "candidate" : "Shane wade" , "health_index" : { "bmi" : 28 , "blood_pressure" : 160 }},
{
"id" : 2 ,
"candidate" : "Bruce tommy" ,
"health_index" : { "bmi" : 31 , "blood_pressure" : 190 },
},
]
pd.json_normalize(data, max_level = 1 )
|
Output:
Here, unlike the before example we flatten the first two rows of the JSON which gives a complete structure for the flat table.
id candidate health_index.bmi health_index.blood_pressure
0 1 Roberto mathews 22 130
1 NaN Shane wade 28 160
2 2 Bruce tommy 31 190
Example 4: Passing Meta Arguments to json_normalize
Finally, let us consider a deeply nested JSON structure that can be converted to a flat table by passing the meta arguments to the json_normalize function as shown below.
Here, in the below code, we have passed the chronological order in which the JSON has to be parsed to a flat table. In the below code, we first suggest the department key to be parsed followed by the company and tagline, then, we pass the management and CEO key as a nested list indicating that they have to be parsed as a single field.
Python3
import pandas as pd
data = [
{
"company" : "Google" ,
"tagline" : "Dont be evil" ,
"management" : { "CEO" : "Sundar Pichai" },
"department" : [
{ "name" : "Gmail" , "revenue (bn)" : 123 },
{ "name" : "GCP" , "revenue (bn)" : 400 },
{ "name" : "Google drive" , "revenue (bn)" : 600 },
],
},
{
"company" : "Microsoft" ,
"tagline" : "Be What's Next" ,
"management" : { "CEO" : "Satya Nadella" },
"department" : [
{ "name" : "Onedrive" , "revenue (bn)" : 13 },
{ "name" : "Azure" , "revenue (bn)" : 300 },
{ "name" : "Microsoft 365" , "revenue (bn)" : 300 },
],
},
]
result = pd.json_normalize(
data, "department" , [ "company" , "tagline" , [ "management" , "CEO" ]]
)
result
|
Output:
name revenue (bn) company tagline management.CEO
0 Gmail 123 Google Dont be evil Sundar Pichai
1 GCP 400 Google Dont be evil Sundar Pichai
2 Google drive 600 Google Dont be evil Sundar Pichai
3 Onedrive 13 Microsoft Be What's Next Satya Nadella
4 Azure 300 Microsoft Be What's Next Satya Nadella
5 Microsoft 365 300 Microsoft Be What's Next Satya Nadella
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...