Open In App

Python Pandas – Flatten nested JSON

Last Updated : 05 Dec, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

It is general practice to convert the JSON data structure to a Pandas Dataframe as it can help to manipulate and visualize the data more conveniently. In this article, let us consider different nested JSON data structures and flatten them using inbuilt and custom-defined functions. 

Python Pandas.json_normalize() Syntax

Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables.

Syntax: pandas.json_normalize(data, errors=’raise’, sep=’.’, max_level=None)

Parameters:

  • data – dict or list of dicts
  • errors – {‘raise’, ‘ignore’}, default ‘raise’
  • sep – str, default ‘.’ nested records will generate names separated by a specified separator.
  • max_level – int, default None. Max number of levels(depth of dict) to normalize.

Python Pandas Flatten Nested JSON Examples

Most of the data extracted from the web through scraping are in the form of JSON datatype because JSON is a preferred datatype for transmitting data in web applications. The reason JSON is preferred is that it’s extremely lightweight to send back and forth in HTTP requests and responses due to the small file size.  Below are the examples by which we can flatten nested json in Python:

Example 1: Pandas json_normalize Function

Consider a list of nested dictionaries that contains details about the students and their marks as shown. Use pandas json_normalize on this JSON data structure to flatten it to a flat table as shown

Python3




import pandas as pd
 
data = [
    {"Roll no": 1,
     "student": {"first_name": "Ram", "last_name": "kumar"}
     },
    {"student": {"English": "95", "Math": "88"}
     },
    {"Roll no": 2,
     "student": {"first_name": "Joseph", "English": "90", "Science": "82"}
     },
    {"Roll no": 3,
     "student": {"first_name": "abinaya", "last_name": "devi"},
     "student": {"English": "91", "Math": "98"}
     },
]
 
df = pd.json_normalize(data)
print(df)


Output:

   Roll no student.first_name student.last_name student.English student.Math student.Science
0        1                Ram            kumar            NaN          NaN             NaN
1      NaN                NaN              NaN             95           88             NaN
2        2             Joseph              NaN             90          NaN              82
3        3            abinaya             devi             91           98             NaN

Example 2: Flatten Nested Json in Pandas using max_level=0

Now let us make use of the max_level option to flatten a slightly complicated JSON structure to a flat table. For this example, we have considered the max_level of 0, which means flattening only the first level of JSON and can experiment with the results.

Here, we have considered an example of the health records of different individuals in JSON format. 

Python3




import pandas as pd
 
data = [
    {
        "id": 1,
        "candidate": "Roberto mathews",
        "health_index": {"bmi": 22, "blood_pressure": 130},
    },
    {"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}},
    {
        "id": 2,
        "candidate": "Bruce tommy",
        "health_index": {"bmi": 31, "blood_pressure": 190},
    },
]
pd.json_normalize(data, max_level=0)


Output:

As we have used only 1 level of flattening, The second level is retained as a key-value pair as shown

   id       candidate  health_index.bmi  health_index.blood_pressure
0   1  Roberto mathews                22                         130
1 NaN      Shane wade                28                         160
2   2    Bruce tommy                31                         190

Example 3: Flatten Nested Json in Pandas using max_level=0

Now let us use the same JSON data structure as above, with max_level of 1, which means flattening the first two levels of JSON and can experiment with the results.

Python3




import pandas as pd
 
data = [
    {
        "id": 1,
        "candidate": "Roberto mathews",
        "health_index": {"bmi": 22, "blood_pressure": 130},
    },
    {"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}},
    {
        "id": 2,
        "candidate": "Bruce tommy",
        "health_index": {"bmi": 31, "blood_pressure": 190},
    },
]
pd.json_normalize(data, max_level=1)


Output:

Here, unlike the before example we flatten the first two rows of the JSON which gives a complete structure for the flat table.

   id       candidate  health_index.bmi  health_index.blood_pressure
0   1  Roberto mathews                22                         130
1 NaN      Shane wade                28                         160
2   2    Bruce tommy                31                         190

Example 4: Passing Meta Arguments to json_normalize

Finally, let us consider a deeply nested JSON structure that can be converted to a flat table by passing the meta arguments to the json_normalize function as shown below.

Here, in the below code, we have passed the chronological order in which the JSON has to be parsed to a flat table. In the below code, we first suggest the department key to be parsed followed by the company and tagline, then, we pass the management and CEO key as a nested list indicating that they have to be parsed as a single field.

Python3




import pandas as pd
data = [
    {
        "company": "Google",
        "tagline": "Dont be evil",
        "management": {"CEO": "Sundar Pichai"},
        "department": [
            {"name": "Gmail", "revenue (bn)": 123},
            {"name": "GCP", "revenue (bn)": 400},
            {"name": "Google drive", "revenue (bn)": 600},
        ],
    },
    {
        "company": "Microsoft",
        "tagline": "Be What's Next",
        "management": {"CEO": "Satya Nadella"},
        "department": [
            {"name": "Onedrive", "revenue (bn)": 13},
            {"name": "Azure", "revenue (bn)": 300},
            {"name": "Microsoft 365", "revenue (bn)": 300},
        ],
    },
 
]
result = pd.json_normalize(
    data, "department", ["company", "tagline", ["management", "CEO"]]
)
result


Output:

          name  revenue (bn)   company         tagline management.CEO
0        Gmail           123    Google    Dont be evil  Sundar Pichai
1          GCP           400    Google    Dont be evil  Sundar Pichai
2  Google drive           600    Google    Dont be evil  Sundar Pichai
3     Onedrive            13  Microsoft  Be What's Next  Satya Nadella
4        Azure           300  Microsoft  Be What's Next  Satya Nadella
5  Microsoft 365           300  Microsoft  Be What's Next  Satya Nadella


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads