Python – Multiple Keys Grouped Summation

Last Updated : 08 May, 2023

Sometimes, while working with Python records, we can have a problem in which, we need to perform elements grouping based on multiple key equality, and also summation of the grouped result of a particular key. This kind of problem can occur in applications in data domains. Let’s discuss certain ways in which this task can be performed.

Input : 
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best')] 
grp_indx = [1, 2] [ Indices to group ] 
sum_idx = [0] [ Index to sum ] 
Output : [('M', 'Gfg', 12), ('H', 'Gfg', 23), ('M', 'Best', 13)]

Input : 
test_list = [(12, 'M', 'Gfg'), (23, 'M', 'Gfg'), (13, 'M', 'Best')] 
grp_indx = [1, 2] [ Indices to group ] 
sum_idx = [0] [ Index to sum ] 
Output : [('M', 'Gfg', 35), ('M', 'Best', 13)]

Method 1: Using loop + defaultdict() + list comprehension

The combination of the above functionalities can be used to solve this problem. In this, we perform grouping using a loop and the task of performing a summation of keys is done using list comprehension.

Approach:

List of tuples test_list is initialized with some values.
grp_indx is a list of grouping indices, indicating the positions of elements in each tuple that will be used for grouping.
sum_idx is a list of summation indices, indicating the positions of elements in each tuple that will be used for summation.
A defaultdict named temp is initialized to store the results.
A loop iterates through each tuple in test_list.
For each tuple, the elements at positions grp_indx[0] and grp_indx[1] are used to form a key for temp.
The value at position sum_idx[0] in the tuple is added to the corresponding value in temp.
Once all tuples have been processed, a list comprehension is used to create a new list res by iterating through each key-value pair in temp and creating a new tuple by concatenating the key and value.
Finally, the grouped summation is printed.

Follow the below steps to implement the above idea:

Python3

# Python3 code to demonstrate working of 
# Multiple Keys Grouped Summation
# Using loop + defaultdict() + list comprehension
from collections import defaultdict
 
# initializing list
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'),
            (13, 'M', 'Best'), (18, 'M', 'Gfg'),
            (2, 'H', 'Gfg'), (23, 'M', 'Best')]
 
# printing original list
print("The original list is : " + str(test_list))
 
# initializing grouping indices
grp_indx = [1, 2]
 
# initializing sum index 
sum_idx = [0]
 
# Multiple Keys Grouped Summation
# Using loop + defaultdict() + list comprehension
temp = defaultdict(int)
for sub in test_list:
    temp[(sub[grp_indx[0]], sub[grp_indx[1]])] += sub[sum_idx[0]]
res = [key + (val, ) for key, val in temp.items()]
                 
# printing result 
print("The grouped summation : " + str(res)) 

Output :
The original list is : [(12, ‘M’, ‘Gfg’), (23, ‘H’, ‘Gfg’), (13, ‘M’, ‘Best’), (18, ‘M’, ‘Gfg’), (2, ‘H’, ‘Gfg’), (23, ‘M’, ‘Best’)]
The grouped summation : [(‘M’, ‘Gfg’, 30), (‘H’, ‘Gfg’, 25), (‘M’, ‘Best’, 36)]

Time complexity: O(n), where n is the length of the input list.
Auxiliary space: O(m), where m is the number of distinct combinations of grouping indices.

Method 2: Using itertools.groupby() and a lambda function for Multiple Keys Grouped Summation

In this method, we first sorts the input list using the sorted() function and a lambda function that extracts the grouping indices. It then uses itertools.groupby() to group the sorted list by the same indices. Finally, it uses a list comprehension to iterate over each group, summing the values of the sum_idx index for each element in the group, and creating a new tuple that includes the grouping indices and the summed value.

Python3

from itertools import groupby
 
# Initializing list
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'),
             (13, 'M', 'Best'), (18, 'M', 'Gfg'),
             (2, 'H', 'Gfg'), (23, 'M', 'Best')]
 
# Printing original list
print("The original list is : " + str(test_list))
 
# Initializing grouping indices
grp_indx = [1, 2]
 
# Initializing sum index
sum_idx = [0]
 
# Multiple Keys Grouped Summation
# Using itertools.groupby() and a lambda function
res = [(key[0], key[1], sum(sub[0] for sub in group))
       for key, group in groupby(sorted(test_list, key=lambda x: (x[grp_indx[0]], x[grp_indx[1]])),
                                 key=lambda x: (x[grp_indx[0]], x[grp_indx[1]]))]
 
# Printing result
print("The grouped summation : " + str(res))

Output

The original list is : [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]

Time complexity: O(n log n) because of the sorting operation. The groupby function itself has a time complexity of O(n).
Auxiliary space: O(n).

Method 3: Using pandas library

Pandas is a powerful library in Python for data manipulation and analysis. It has a groupby function that can be used to group data by one or more keys and perform operations on the grouped data.

Python3

import pandas as pd
 
# initializing list
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'),
            (13, 'M', 'Best'), (18, 'M', 'Gfg'),
            (2, 'H', 'Gfg'), (23, 'M', 'Best')]
 
# creating a pandas DataFrame from the list
df = pd.DataFrame(test_list, columns=['value', 'key1', 'key2'])
 
# grouping by key1 and key2 and summing the values
grouped = df.groupby(['key1', 'key2'])['value'].sum()
 
# converting the result back to a list of tuples
res = [(key[0], key[1], value) for key, value in grouped.items()]
 
# printing result
print("The grouped summation : " + str(res))

OUTPUT-
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]

Time complexity: O(n log n) because of the sorting operation performed internally by pandas for grouping the data.
Auxiliary space: O(n) because pandas needs to create a DataFrame object to store the input data and perform the grouping operation.

Method 4: Using itertools.groupby() and operator.itemgetter()

Use the itertools.groupby() function and the operator.itemgetter() function to group the elements by their keys and sum the values.

Python3

import itertools
import operator
 
# Initializing list
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'),
             (13, 'M', 'Best'), (18, 'M', 'Gfg'),
             (2, 'H', 'Gfg'), (23, 'M', 'Best')]
 
# Initializing grouping indices
grp_indx = [1, 2]
 
# Initializing sum index
sum_idx = [0]
 
# Multiple Keys Grouped Summation
# Using itertools.groupby() and operator.itemgetter()
test_list.sort(key=operator.itemgetter(*grp_indx))
 
res = []
 
for k, g in itertools.groupby(test_list, key=operator.itemgetter(*grp_indx)):
    vals = [sub[sum_idx[0]] for sub in g]
    res.append(k + (sum(vals),))
 
# Printing result
print("The grouped summation : " + str(res))

Output

The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]

Time complexity: O(n log n) due to the sorting of the input list using the sorted() function.
Auxiliary space: O(n) because the result list res and the temporary list vals both have a maximum size of n, where n is the number of elements in the input list.

Method 5: Using dictionary comprehension

Initialize the input list, grouping indices, and sum index.
Create a dictionary comprehension to initialize a dictionary with keys as tuples of grouping indices and values as 0.
Traverse through each sub-list in the input list, and update the corresponding key value in the dictionary by adding the value at the sum index to the existing value.
Convert the dictionary to a list of tuples where each tuple contains the grouping indices followed by the sum.
Print the result.

Python3

# Initializing list
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'),
             (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]
 
# Initializing grouping indices
grp_indx = [1, 2]
 
# Initializing sum index
sum_idx = [0]
 
# Multiple Keys Grouped Summation
# Using dictionary comprehension
temp = {(sub[grp_indx[0]], sub[grp_indx[1]]): 0 for sub in test_list}
 
for sub in test_list:
    temp[(sub[grp_indx[0]], sub[grp_indx[1]])] += sub[sum_idx[0]]
 
    res = [key + (val,) for key, val in temp.items()]
 
# Printing result
print("The grouped summation: " + str(res))

Output

The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]

Time complexity: O(n). Where n is the length of the dictionary.
Auxiliary Space: O(m), where m is the number of unique combinations of grouping indices.

Method 6: Using the built-in function reduce() from the functools module

reduce() is a function from the functools module in Python that applies a function of two arguments cumulatively on a sequence of elements, in this case, our list of tuples.

Approach:

Import the functools module
Initialize grp_indx and sum_idx variables as before
Define a lambda function that takes two tuples as arguments and returns a tuple with the same first two elements and the sum of their third elements. This function will be used by reduce() to perform the grouped summation.
Use reduce() to apply the lambda function on the list of tuples. The initial value passed to reduce() is an empty dictionary.
Convert the resulting dictionary to a list of tuples, where each tuple has the same first two elements as the keys of the dictionary and the third element is the value of the corresponding key.
Print the result.

Below is the implementation of the above approach:

Python3

from functools import reduce
 
# Initializing list
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'),
             (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]
 
# Initializing grouping indices
grp_indx = [1, 2]
 
# Initializing sum index
sum_idx = [0]
 
# Using reduce() for Multiple Keys Grouped Summation
res_dict = reduce(lambda d, t: {**d, (t[grp_indx[0]], t[grp_indx[1]]): d.get((t[grp_indx[0]], t[grp_indx[1]]), 0) + t[sum_idx[0]]}, test_list, {})
 
# Converting the dictionary to a list of tuples
res = [(k[0], k[1], v) for k, v in res_dict.items()]
 
# printing result
print("The grouped summation: " + str(res))

Output

The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]

Time Complexity: O(nlogn) due to the use of reduce() which has a time complexity of O(n) and the time complexity of the lambda function which is O(logn).
Auxiliary Space: O(n) because of the use of a dictionary to store intermediate results.

Method 7: Using NumPy

Steps:

First, we import the NumPy library.
We initialize the input list (test_list) and the grouping and sum indices (grp_indx and sum_indx, respectively).
We convert the input list to a NumPy array using np.array().
We extract the grouping and sum indices as separate arrays using array slicing (arr[:, grp_indx] and arr[:, sum_idx], respectively).\
We convert the sum_arr to a numeric data type (such as int) using the astype() method, so that we can perform summation on it later.
We use the np.unique() function to find the unique combinations of the grouping indices (grp_arr) and store them in unique_groups.
We iterate over the unique combinations of grouping indices using a for loop.
For each unique combination, we calculate the grouped summation by using the np.all() function to compare the grp_arr with the current group, and then summing the corresponding values in sum_arr.
We append the results as tuples to a list called result.
Finally, we print the result.

Python3

import numpy as np
 
# initializing list
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'),
             (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]
 
# initializing grouping indices
grp_indx = [1, 2]
 
# initializing sum index
sum_idx = [0]
 
# convert the list to a NumPy array
arr = np.array(test_list)
 
# extract the grouping and sum indices as separate arrays
grp_arr = arr[:, grp_indx]
sum_arr = arr[:, sum_idx].astype(int)  # convert to int for numeric summation
 
# use np.unique() to find the unique combinations of the grouping indices
unique_groups = np.unique(grp_arr, axis=0)
 
# iterate over the unique combinations and calculate the grouped summation
result = []
for group in unique_groups:
    group_sum = np.sum(sum_arr[np.all(grp_arr == group, axis=1)])
    result.append((group[0], group[1], group_sum))
 
# printing result
print("The grouped summation: " + str(result))

OUTPUT :
The grouped summation: [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]

Time complexity: O(NlogN) for np.unique(), where N is the number of elements in test_list, and O(N) for the for loop.
Auxiliary Space: O(N) for the NumPy arrays and O(N) for the result list.