Python – Multiple Keys Grouped Summation
Sometimes, while working with Python records, we can have a problem in which, we need to perform elements grouping based on multiple key equality, and also summation of the grouped result of particular key. This kind of problem can occur in application in data domains. Let’s discuss certain way in which this task can be performed.
Input :
test_list = [(12, ‘M’, ‘Gfg’), (23, ‘H’, ‘Gfg’), (13, ‘M’, ‘Best’)]
grp_indx = [1, 2] [ Indices to group ]
sum_idx = [0] [ Index to sum ]
Output : [(‘M’, ‘Gfg’, 12), (‘H’, ‘Gfg’, 23), (‘M’, ‘Best’, 13)]Input :
test_list = [(12, ‘M’, ‘Gfg’), (23, ‘M’, ‘Gfg’), (13, ‘M’, ‘Best’)]
grp_indx = [1, 2] [ Indices to group ]
sum_idx = [0] [ Index to sum ]
Output : [(‘M’, ‘Gfg’, 35), (‘M’, ‘Best’, 13)]
Method 1: Using loop + defaultdict() + list comprehension
The combination of above functionalities can be used to solve this problem. In this, we perform grouping using loop and the task of performing summation of key is done using list comprehension.
Python3
# Python3 code to demonstrate working of # Multiple Keys Grouped Summation # Using loop + defaultdict() + list comprehension from collections import defaultdict # initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # printing original list print ( "The original list is : " + str (test_list)) # initializing grouping indices grp_indx = [ 1 , 2 ] # initializing sum index sum_idx = [ 0 ] # Multiple Keys Grouped Summation # Using loop + defaultdict() + list comprehension temp = defaultdict( int ) for sub in test_list: temp[(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]])] + = sub[sum_idx[ 0 ]] res = [key + (val, ) for key, val in temp.items()] # printing result print ( "The grouped summation : " + str (res)) |
The original list is : [(12, ‘M’, ‘Gfg’), (23, ‘H’, ‘Gfg’), (13, ‘M’, ‘Best’), (18, ‘M’, ‘Gfg’), (2, ‘H’, ‘Gfg’), (23, ‘M’, ‘Best’)]
The grouped summation : [(‘M’, ‘Gfg’, 30), (‘H’, ‘Gfg’, 25), (‘M’, ‘Best’, 36)]
Time complexity: O(n), where n is the length of the input list.
Auxiliary space: O(m), where m is the number of distinct combinations of grouping indices.
Method 2: Using itertools.groupby() and a lambda function for Multiple Keys Grouped Summation.
In this method we first sorts the input list using the sorted() function and a lambda function that extracts the grouping indices. It then uses itertools.groupby() to group the sorted list by the same indices. Finally, it uses a list comprehension to iterate over each group, summing the values of the sum_idx index for each element in the group, and creating a new tuple that includes the grouping indices and the summed value.
Python3
from itertools import groupby # initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # printing original list print ( "The original list is : " + str (test_list)) # initializing grouping indices grp_indx = [ 1 , 2 ] # initializing sum index sum_idx = [ 0 ] # Multiple Keys Grouped Summation # Using itertools.groupby() and a lambda function res = [(key[ 0 ], key[ 1 ], sum (sub[ 0 ] for sub in group)) for key, group in groupby( sorted (test_list, key = lambda x: (x[grp_indx[ 0 ]], x[grp_indx[ 1 ]])), key = lambda x: (x[grp_indx[ 0 ]], x[grp_indx[ 1 ]]))] # printing result print ( "The grouped summation : " + str (res)) |
The original list is : [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')] The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) because of the sorting operation. The groupby function itself has a time complexity of O(n).
Auxiliary space: O(n).
Method 3: Using pandas library
Pandas is a powerful library in Python for data manipulation and analysis. It has a groupby function that can be used to group data by one or more keys and perform operations on the grouped data.
Python3
import pandas as pd # initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # creating a pandas DataFrame from the list df = pd.DataFrame(test_list, columns = [ 'value' , 'key1' , 'key2' ]) # grouping by key1 and key2 and summing the values grouped = df.groupby([ 'key1' , 'key2' ])[ 'value' ]. sum () # converting the result back to a list of tuples res = [(key[ 0 ], key[ 1 ], value) for key, value in grouped.items()] # printing result print ( "The grouped summation : " + str (res)) |
OUTPUT- The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) because of the sorting operation performed internally by pandas for grouping the data.
Auxiliary space: O(n) because pandas needs to create a DataFrame object to store the input data and perform the grouping operation.
Method 4: Using itertools.groupby() and operator.itemgetter()
Use the itertools.groupby() function and the operator.itemgetter() function to group the elements by their keys and sum the values.
Python3
import itertools import operator # initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # initializing grouping indices grp_indx = [ 1 , 2 ] # initializing sum index sum_idx = [ 0 ] # Multiple Keys Grouped Summation # Using itertools.groupby() and operator.itemgetter() test_list.sort(key = operator.itemgetter( * grp_indx)) res = [] for k, g in itertools.groupby(test_list, key = operator.itemgetter( * grp_indx)): vals = [sub[sum_idx[ 0 ]] for sub in g] res.append(k + ( sum (vals),)) # printing result print ( "The grouped summation : " + str (res)) |
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) due to the sorting of the input list using the sorted() function.
Auxiliary space: O(n) because the result list res and the temporary list vals both have a maximum size of n, where n is the number of elements in the input list.
Method 5: Using dictionary comprehension
- Initialize the input list, grouping indices and sum index.
- Create a dictionary comprehension to initialize a dictionary with keys as tuples of grouping indices and values as 0.
- Traverse through each sub-list in the input list, and update the corresponding key-value in the dictionary by adding the value at the sum index to the existing value.
- Convert the dictionary to a list of tuples where each tuple contains the grouping indices followed by the sum.
- Print the result.
Python3
# initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # initializing grouping indices grp_indx = [ 1 , 2 ] # initializing sum index sum_idx = [ 0 ] # Multiple Keys Grouped Summation # Using dictionary comprehension temp = {(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]]): 0 for sub in test_list} for sub in test_list: temp[(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]])] + = sub[sum_idx[ 0 ]] res = [key + (val,) for key, val in temp.items()] # printing result print ( "The grouped summation: " + str (res)) |
The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]
Time complexity: O(n). Where n is the length of the dictionary.
Auxiliary Space: O(m), where m is the number of unique combinations of grouping indices.
Please Login to comment...