Open In App

Python Pandas – Flatten nested JSON

Last Updated : 05 Dec, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

It is general practice to convert the JSON data structure to a Pandas Dataframe as it can help to manipulate and visualize the data more conveniently. In this article, let us consider different nested JSON data structures and flatten them using inbuilt and custom-defined functions. 

Python Pandas.json_normalize() Syntax

Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables.

Syntax: pandas.json_normalize(data, errors=’raise’, sep=’.’, max_level=None)

Parameters:

  • data – dict or list of dicts
  • errors – {‘raise’, ‘ignore’}, default ‘raise’
  • sep – str, default ‘.’ nested records will generate names separated by a specified separator.
  • max_level – int, default None. Max number of levels(depth of dict) to normalize.

Python Pandas Flatten Nested JSON Examples

Most of the data extracted from the web through scraping are in the form of JSON datatype because JSON is a preferred datatype for transmitting data in web applications. The reason JSON is preferred is that it’s extremely lightweight to send back and forth in HTTP requests and responses due to the small file size.  Below are the examples by which we can flatten nested json in Python:

Example 1: Pandas json_normalize Function

Consider a list of nested dictionaries that contains details about the students and their marks as shown. Use pandas json_normalize on this JSON data structure to flatten it to a flat table as shown

Python3




import pandas as pd
 
data = [
    {"Roll no": 1,
     "student": {"first_name": "Ram", "last_name": "kumar"}
     },
    {"student": {"English": "95", "Math": "88"}
     },
    {"Roll no": 2,
     "student": {"first_name": "Joseph", "English": "90", "Science": "82"}
     },
    {"Roll no": 3,
     "student": {"first_name": "abinaya", "last_name": "devi"},
     "student": {"English": "91", "Math": "98"}
     },
]
 
df = pd.json_normalize(data)
print(df)


Output:

   Roll no student.first_name student.last_name student.English student.Math student.Science
0        1                Ram            kumar            NaN          NaN             NaN
1      NaN                NaN              NaN             95           88             NaN
2        2             Joseph              NaN             90          NaN              82
3        3            abinaya             devi             91           98             NaN

Example 2: Flatten Nested Json in Pandas using max_level=0

Now let us make use of the max_level option to flatten a slightly complicated JSON structure to a flat table. For this example, we have considered the max_level of 0, which means flattening only the first level of JSON and can experiment with the results.

Here, we have considered an example of the health records of different individuals in JSON format. 

Python3




import pandas as pd
 
data = [
    {
        "id": 1,
        "candidate": "Roberto mathews",
        "health_index": {"bmi": 22, "blood_pressure": 130},
    },
    {"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}},
    {
        "id": 2,
        "candidate": "Bruce tommy",
        "health_index": {"bmi": 31, "blood_pressure": 190},
    },
]
pd.json_normalize(data, max_level=0)


Output:

As we have used only 1 level of flattening, The second level is retained as a key-value pair as shown

   id       candidate  health_index.bmi  health_index.blood_pressure
0   1  Roberto mathews                22                         130
1 NaN      Shane wade                28                         160
2   2    Bruce tommy                31                         190

Example 3: Flatten Nested Json in Pandas using max_level=0

Now let us use the same JSON data structure as above, with max_level of 1, which means flattening the first two levels of JSON and can experiment with the results.

Python3




import pandas as pd
 
data = [
    {
        "id": 1,
        "candidate": "Roberto mathews",
        "health_index": {"bmi": 22, "blood_pressure": 130},
    },
    {"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}},
    {
        "id": 2,
        "candidate": "Bruce tommy",
        "health_index": {"bmi": 31, "blood_pressure": 190},
    },
]
pd.json_normalize(data, max_level=1)


Output:

Here, unlike the before example we flatten the first two rows of the JSON which gives a complete structure for the flat table.

   id       candidate  health_index.bmi  health_index.blood_pressure
0   1  Roberto mathews                22                         130
1 NaN      Shane wade                28                         160
2   2    Bruce tommy                31                         190

Example 4: Passing Meta Arguments to json_normalize

Finally, let us consider a deeply nested JSON structure that can be converted to a flat table by passing the meta arguments to the json_normalize function as shown below.

Here, in the below code, we have passed the chronological order in which the JSON has to be parsed to a flat table. In the below code, we first suggest the department key to be parsed followed by the company and tagline, then, we pass the management and CEO key as a nested list indicating that they have to be parsed as a single field.

Python3




import pandas as pd
data = [
    {
        "company": "Google",
        "tagline": "Dont be evil",
        "management": {"CEO": "Sundar Pichai"},
        "department": [
            {"name": "Gmail", "revenue (bn)": 123},
            {"name": "GCP", "revenue (bn)": 400},
            {"name": "Google drive", "revenue (bn)": 600},
        ],
    },
    {
        "company": "Microsoft",
        "tagline": "Be What's Next",
        "management": {"CEO": "Satya Nadella"},
        "department": [
            {"name": "Onedrive", "revenue (bn)": 13},
            {"name": "Azure", "revenue (bn)": 300},
            {"name": "Microsoft 365", "revenue (bn)": 300},
        ],
    },
 
]
result = pd.json_normalize(
    data, "department", ["company", "tagline", ["management", "CEO"]]
)
result


Output:

          name  revenue (bn)   company         tagline management.CEO
0        Gmail           123    Google    Dont be evil  Sundar Pichai
1          GCP           400    Google    Dont be evil  Sundar Pichai
2  Google drive           600    Google    Dont be evil  Sundar Pichai
3     Onedrive            13  Microsoft  Be What's Next  Satya Nadella
4        Azure           300  Microsoft  Be What's Next  Satya Nadella
5  Microsoft 365           300  Microsoft  Be What's Next  Satya Nadella


Similar Reads

Python - Flatten Nested Dictionary to Matrix
Sometimes, while working with data, we can have a problem in which we need to convert a nested dictionary into Matrix, each nesting comprising the different rows in the matrix. This can have applications in many data domains. Let us discuss certain ways in which this task can be performed. Method #1 : Using loop + zip() + map() The combination of t
5 min read
Python - Flatten Nested Keys
Sometimes, while working with Python data, we can have a problem in which we need to perform the flattening of certain keys in nested list records. This kind of problem occurs while data preprocessing. Let us discuss certain ways in which this task can be performed. Method #1: Using loopThis is a brute-force method to perform this task. In this, we
10 min read
Python - Flatten Nested Tuples
Sometimes, while working with Python Tuples, we can have a problem in which we need to perform flattening of tuples, which can be nested and undesired. This can have application across many domains such as Data Science and web development. Let's discuss certain way in which this task can be performed. Input : test_tuple = ((4, 7), ((4, 5), ((6, 7),
7 min read
Python Program to Flatten a Nested List using Recursion
Given a nested list, the task is to write a python program to flatten a nested list using recursion. Examples: Input: [[8, 9], [10, 11, 'geeks'], [13]] Output: [8, 9, 10, 11, 'geeks', 13] Input: [['A', 'B', 'C'], ['D', 'E', 'F']] Output: ['A', 'B', 'C', 'D', 'E', 'F'] Step-by-step Approach: Firstly, we try to initialize a variable into the linked l
3 min read
Python program to Flatten Nested List to Tuple List
Given a list of tuples with each tuple wrapped around multiple lists, our task is to write a Python program to flatten the container to a list of tuples. Input : test_list = [[[(4, 6)]], [[[(7, 4)]]], [[[[(10, 3)]]]]]Output : [(4, 6), (7, 4), (10, 3)]Explanation : The surrounded lists are omitted around each tuple. Input : test_list = [[[(4, 6)]],
7 min read
Converting nested JSON structures to Pandas DataFrames
In this article, we are going to see how to convert nested JSON structures to Pandas DataFrames. JSON with multiple levels In this case, the nested JSON data contains another JSON object as the value for some of its attributes. This makes the data multi-level and we need to flatten it as per the project requirements for better readability, as expla
4 min read
Normalizing Nested Json Object Into Pandas Dataframe
The encapsulation of one or more JSON objects into another JSON object is called a nested JSON object. There are various circumstances when we have the data in JSON object form and have to imply numerous functions on the dataset. In such cases, we can use the Python library called Pandas which is used for datasets. For converting into a Pandas data
3 min read
How to Flatten MultiIndex in Pandas?
In this article, we will discuss how to flatten multiIndex in pandas. Flatten all levels of MultiIndex: In this method, we are going to flat all levels of the dataframe by using the reset_index() function. Syntax: dataframe.reset_index(inplace=True) Note: Dataframe is the input dataframe, we have to create the dataframe MultiIndex. Syntax: MultiInd
3 min read
How to flatten a hierarchical index in Pandas DataFrame columns?
In this article, we are going to see the flatten a hierarchical index in Pandas DataFrame columns. Hierarchical Index usually occurs as a result of groupby() aggregation functions. Flatten hierarchical index in Pandas, the aggregated function used will appear in the hierarchical index of the resulting dataframe. Using reset_index() function Pandas
6 min read
Python | Check if a nested list is a subset of another nested list
Given two lists list1 and list2, check if list2 is a subset of list1 and return True or False accordingly. Examples: Input : list1 = [[2, 3, 1], [4, 5], [6, 8]] list2 = [[4, 5], [6, 8]] Output : True Input : list1 = [['a', 'b'], ['e'], ['c', 'd']] list2 = [['g']] Output : False Let's discuss few approaches to solve the problem. Approach #1 : Naive
7 min read
Article Tags :
Practice Tags :