Open In App

Python – Flatten Nested Keys

Last Updated : 06 Apr, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Sometimes, while working with Python data, we can have a problem in which we need to perform the flattening of certain keys in nested list records. This kind of problem occurs while data preprocessing. Let us discuss certain ways in which this task can be performed. 

Method #1: Using loop

This is a brute-force method to perform this task. In this, we construct new dictionary by assigning base keys and then perform the flattening of inner key elements using a nested loop. 

Python3




# Python3 code to demonstrate working of
# Flatten Nested Keys
# Using loop
 
# initializing list
test_list = [{'Gfg': 1, 'id': 1, 'data': [{'rating': 7, 'price': 4},
                                          {'rating': 17, 'price': 8}]},
             {'Gfg': 1, 'id': 2, 'data': [{'rating': 18, 'price': 19}]}]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Flatten Nested Keys
# Using loop
res = []
for sub in test_list:
    temp1 = {
        'Gfg': sub['Gfg'],
        'id': sub['id']
    }
    for data in sub.get('data', []):
        res.append({
            **temp1,
            'rating': data['rating'],
            'price': data['price']})
 
# printing result
print("The flattened list : " + str(res))


Output

The original list is : [{'Gfg': 1, 'id': 1, 'data': [{'rating': 7, 'price': 4}, {'rating': 17, 'price': 8}]}, {'Gfg': 1, 'id': 2, 'data': [{'rating': 18, 'price': 19}]}]
The flattened list : [{'Gfg': 1, 'id': 1, 'rating': 7, 'price': 4}, {'Gfg': 1, 'id': 1, 'rating': 17, 'price': 8}, {'Gfg': 1, 'id': 2, 'rating': 18, 'price': 19}]

Time Complexity: O(n * m), where n is the number of elements in the outer list and m is the average number of elements in each inner ‘data’ list.
Auxiliary Space: O(n), where n is the number of elements in the outer list. This is because we use a single res list to store the flattened elements and its size grows with each iteration of the loop.

Method #2: Using list comprehension + zip() + itemgetter() 

The combination of the above functions can be used to perform this task. In this, we extract the required pairs using itemgetter() and combine pairs using zip(). The compilation of data is using list comprehension. 

Python3




# Python3 code to demonstrate working of
# Flatten Nested Keys
# Using list comprehension + zip() + itemgetter()
from operator import itemgetter
 
# initializing list
test_list = [{'Gfg' 1, 'id' : 1, 'data' : [{'rating' : 7, 'price' : 4},
             {'rating' : 17, 'price' : 8}]},
             {'Gfg' 1, 'id' : 2, 'data' : [{'rating' : 18, 'price' : 19}]}]
 
# printing original list
print("The original list is : " + str(test_list))
 
# initializing base and flatten keys
base_keys = 'Gfg', 'id'
flatten_keys = 'rating', 'price'
 
# Flatten Nested Keys
# Using list comprehension + zip() + itemgetter()
res = [dict(zip(base_keys + flatten_keys, itemgetter(*base_keys)(sub) + itemgetter(*flatten_keys)(data))) for sub in test_list for data in sub['data']]
 
# printing result
print("The flattened list : " + str(res))


Output : 

The original list is : [{‘data’: [{‘rating’: 7, ‘price’: 4}, {‘rating’: 17, ‘price’: 8}], ‘id’: 1, ‘Gfg’: 1}, {‘data’: [{‘rating’: 18, ‘price’: 19}], ‘id’: 2, ‘Gfg’: 1}] The flattened list : [{‘price’: 4, ‘rating’: 7, ‘id’: 1, ‘Gfg’: 1}, {‘price’: 8, ‘rating’: 17, ‘id’: 1, ‘Gfg’: 1}, {‘price’: 19, ‘rating’: 18, ‘id’: 2, ‘Gfg’: 1}]

Time complexity: O(n) where n is the total number of elements in the nested list. The reason for this is that the code needs to iterate through the elements of the nested list in the list comprehension.
Auxiliary space: O(n) as the result list will have n elements. This is because the result list is being constructed by iterating through the elements of the nested list and adding new elements to it.

Method #3 : Using update() method

Python3




# Python3 code to demonstrate working of
# Flatten Nested Keys
# Using loop
 
# initializing list
test_list = [{'Gfg' : 1, 'id' : 1, 'data' : [{'rating' : 7, 'price' : 4},
            {'rating' : 17, 'price' : 8}]},
            {'Gfg' : 1, 'id' : 2, 'data' : [{'rating' : 18, 'price' : 19}]}]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Flatten Nested Keys
# Using loop
res = []
for sub in test_list:
    for j in sub["data"]:
        j.update({"id":sub["id"]})
        j.update({"Gfg":sub["Gfg"]})
        res.append(j)
         
# printing result
print("The flattened list : " + str(res))


Output

The original list is : [{'Gfg': 1, 'id': 1, 'data': [{'rating': 7, 'price': 4}, {'rating': 17, 'price': 8}]}, {'Gfg': 1, 'id': 2, 'data': [{'rating': 18, 'price': 19}]}]
The flattened list : [{'rating': 7, 'price': 4, 'id': 1, 'Gfg': 1}, {'rating': 17, 'price': 8, 'id': 1, 'Gfg': 1}, {'rating': 18, 'price': 19, 'id': 2, 'Gfg': 1}]

Time complexity: O(n*m), where n is the length of the input list and m is the average length of the “data” list in each dictionary.
Auxiliary Space: O(nm), because we are creating a new list to store the flattened data, which can have a size of up to nm elements.

Method #4: Using recursion

We can also flatten the nested keys using recursion. Here’s how we can implement it:

  1. Define a function flatten_keys that takes a dictionary as an input.
  2. Initialize an empty dictionary result.
  3. Loop through each key-value pair in the dictionary.
  4. If the value is a dictionary, call flatten_keys recursively with the value as the input and store the result in result.
  5. If the value is a list, loop through each element in the list and call flatten_keys recursively with the element as the input and store the result in result.
  6. If the value is not a dictionary or list, add it to the result dictionary with the same key.
  7. Return the result dictionary.

Python3




def flatten_keys(d):
    result = {}
    for k, v in d.items():
        if isinstance(v, dict):
            flat_v = flatten_keys(v)
            for flat_k, flat_v in flat_v.items():
                result[k + '.' + flat_k] = flat_v
        elif isinstance(v, list):
            for i, item in enumerate(v):
                flat_item = flatten_keys(item)
                for flat_k, flat_v in flat_item.items():
                    result[f"{k}.{i}.{flat_k}"] = flat_v
        else:
            result[k] = v
    return result
 
 
# input list
test_list = [{'Gfg': 1, 'id': 1, 'data': [{'rating': 7, 'price': 4},
                                          {'rating': 17, 'price': 8}]},
             {'Gfg': 1, 'id': 2, 'data': [{'rating': 18, 'price': 19}]}]
 
flattened_list = []
for d in test_list:
    flattened_dict = flatten_keys(d)
    flattened_list.append(flattened_dict)
 
print("The flattened list : " + str(flattened_list))


Output

The flattened list : [{'Gfg': 1, 'id': 1, 'data.0.rating': 7, 'data.0.price': 4, 'data.1.rating': 17, 'data.1.price': 8}, {'Gfg': 1, 'id': 2, 'data.0.rating': 18, 'data.0.price': 19}]

Time complexity of this approach is O(n), where n is the number of elements in the nested dictionary. 
Auxiliary space required is also O(n), where n is the number of elements in the flattened dictionary.

Method #5: Using a stack

We can flatten the nested dictionaries using a stack data structure. We start with an empty dictionary and a stack containing the input dictionary. We pop a dictionary from the stack, iterate through its key-value pairs, and if the value is a dictionary, we add the current key to each of its keys and push it onto the stack. If the value is a list, we iterate through each element of the list and add the current index and key to its keys and push it onto the stack. Otherwise, we add the key-value pair to the flattened dictionary.

 step-by-step approach :

  1. Define a function named flatten_keys_stack that takes in a dictionary as its only argument.
  2. Create an empty dictionary named flattened_dict to hold the flattened key-value pairs.
  3. Create a list named stack containing a tuple of the input dictionary and an empty string (representing the current prefix for keys).
  4. While the stack is not empty, pop a tuple from the stack and unpack it into curr_dict and prefix.
  5. Iterate through the key-value pairs of curr_dict.
  6. If the value is a dictionary, add the current key to the prefix and push the dictionary and new prefix onto the stack.
  7. If the value is a list, iterate through each element of the list, add the current index and key to the prefix, and push the element and new prefix onto the stack.
  8. Otherwise, add the prefix and current key as a concatenated string as the key to flattened_dict, with the value being the current value.
  9. Return flattened_dict.
  10. Modify the existing code to use the flatten_keys_stack function instead of the flatten_keys function.
  11. Run the code and verify that the flattened list is the same as the previous methods.

Python3




def flatten_keys_stack(d):
    flattened_dict = {}
    stack = [(d, '')]
 
    while stack:
        curr_dict, prefix = stack.pop()
 
        for k, v in curr_dict.items():
            if isinstance(v, dict):
                stack.append((v, prefix + k + '.'))
            elif isinstance(v, list):
                for i, item in enumerate(v):
                    stack.append((item, prefix + k + '.' + str(i) + '.'))
            else:
                flattened_dict[prefix + k] = v
 
    return flattened_dict
 
# input list
test_list = [{'Gfg': 1, 'id': 1, 'data': [{'rating': 7, 'price': 4},
                                          {'rating': 17, 'price': 8}]},
             {'Gfg': 1, 'id': 2, 'data': [{'rating': 18, 'price': 19}]}]
 
flattened_list = []
for d in test_list:
    flattened_dict = flatten_keys_stack(d)
    flattened_list.append(flattened_dict)
    print("Flattened dictionary: " + str(flattened_dict))
 
print("The flattened list: " + str(flattened_list))


Output

Flattened dictionary: {'Gfg': 1, 'id': 1, 'data.1.rating': 17, 'data.1.price': 8, 'data.0.rating': 7, 'data.0.price': 4}
Flattened dictionary: {'Gfg': 1, 'id': 2, 'data.0.rating': 18, 'data.0.price': 19}
The flattened list: [{'Gfg': 1, 'id': 1, 'data.1.rating': 17, 'data.1.price': 8, 'data.0.rating': 7, 'data.0.price': 4}, {'Gfg': 1, 'id': 2, 'data.0.rating': 18, 'data.0.price': 19}]

The time and auxiliary space complexity of this method are both O(N), where N is the total number of key-value pairs in the input dictionary. 

The auxiliary space comes from the use of the stack to traverse the nested dictionaries and lists.

Method 6 : Uses the flatten_dict_pandas() function:

 step-by-step approach for the program:

  1. Import the pandas library with the alias pd.
  2. Define the flatten_dict_pandas() function that takes a dictionary d as input.
  3. Inside the function, use pd.json_normalize() to convert the dictionary to a flattened pandas DataFrame, with the keys joined by dots as the column names. The sep parameter specifies the separator to use between the keys.
  4. Then, use the to_dict() method of the DataFrame to convert it back to a dictionary, with orient=’records’ to return a list of dictionaries, and take the first dictionary from the list.
  5. Return the flattened dictionary.
  6. Define an input list test_list that contains nested dictionaries and lists.
  7. Create an empty list flattened_list to store the flattened dictionaries.
  8. Loop through each dictionary d in test_list.
  9. Call flatten_dict_pandas() on d to flatten it.
  10. Append the flattened dictionary to flattened_list.
  11. Print the flattened dictionary.
  12. After the loop, print the entire flattened list.

Python3




import pandas as pd
 
def flatten_dict_pandas(d):
    df = pd.json_normalize(d, sep='.')
    return df.to_dict(orient='records')[0]
 
# input list
test_list = [{'Gfg': 1, 'id': 1, 'data': [{'rating': 7, 'price': 4},
                                          {'rating': 17, 'price': 8}]},
             {'Gfg': 1, 'id': 2, 'data': [{'rating': 18, 'price': 19}]}]
 
flattened_list = []
for d in test_list:
    flattened_dict = flatten_dict_pandas(d)
    flattened_list.append(flattened_dict)
    print("Flattened dictionary: " + str(flattened_dict))
 
print("The flattened list: " + str(flattened_list))


OUTPUT:

Flattened dictionary: {'Gfg': 1, 'id': 1, 'data': [{'rating': 7, 'price': 4}, {'rating': 17, 'price': 8}]}
Flattened dictionary: {'Gfg': 1, 'id': 2, 'data': [{'rating': 18, 'price': 19}]}
The flattened list: [{'Gfg': 1, 'id': 1, 'data': [{'rating': 7, 'price': 4}, {'rating': 17, 'price': 8}]}, {'Gfg': 1, 'id': 2, 'data': [{'rating': 18, 'price': 19}]}]

The time complexity of this method is O(nlogn), where n is the total number of keys and values in the dictionary.

 The auxiliary space is also O(nlogn), due to the creation of the pandas DataFrame.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads