Sometimes, while working with Python records, we can have a problem in which, we need to perform elements grouping based on multiple key equality, and also summation of the grouped result of a particular key. This kind of problem can occur in applications in data domains. Let’s discuss certain ways in which this task can be performed.
Input :
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best')]
grp_indx = [1, 2] [ Indices to group ]
sum_idx = [0] [ Index to sum ]
Output : [('M', 'Gfg', 12), ('H', 'Gfg', 23), ('M', 'Best', 13)]
Input :
test_list = [(12, 'M', 'Gfg'), (23, 'M', 'Gfg'), (13, 'M', 'Best')]
grp_indx = [1, 2] [ Indices to group ]
sum_idx = [0] [ Index to sum ]
Output : [('M', 'Gfg', 35), ('M', 'Best', 13)]
Method 1: Using loop + defaultdict() + list comprehension
The combination of the above functionalities can be used to solve this problem. In this, we perform grouping using a loop and the task of performing a summation of keys is done using list comprehension.
Approach:
- List of tuples test_list is initialized with some values.
- grp_indx is a list of grouping indices, indicating the positions of elements in each tuple that will be used for grouping.
- sum_idx is a list of summation indices, indicating the positions of elements in each tuple that will be used for summation.
- A defaultdict named temp is initialized to store the results.
- A loop iterates through each tuple in test_list.
For each tuple, the elements at positions grp_indx[0] and grp_indx[1] are used to form a key for temp.
- The value at position sum_idx[0] in the tuple is added to the corresponding value in temp.
- Once all tuples have been processed, a list comprehension is used to create a new list res by iterating through each key-value pair in temp and creating a new tuple by concatenating the key and value.
- Finally, the grouped summation is printed.
Follow the below steps to implement the above idea:
Python3
from collections import defaultdict
test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ),
( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ),
( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )]
print ( "The original list is : " + str (test_list))
grp_indx = [ 1 , 2 ]
sum_idx = [ 0 ]
temp = defaultdict( int )
for sub in test_list:
temp[(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]])] + = sub[sum_idx[ 0 ]]
res = [key + (val, ) for key, val in temp.items()]
print ( "The grouped summation : " + str (res))
|
Output :
The original list is : [(12, ‘M’, ‘Gfg’), (23, ‘H’, ‘Gfg’), (13, ‘M’, ‘Best’), (18, ‘M’, ‘Gfg’), (2, ‘H’, ‘Gfg’), (23, ‘M’, ‘Best’)]
The grouped summation : [(‘M’, ‘Gfg’, 30), (‘H’, ‘Gfg’, 25), (‘M’, ‘Best’, 36)]
Time complexity: O(n), where n is the length of the input list.
Auxiliary space: O(m), where m is the number of distinct combinations of grouping indices.
Method 2: Using itertools.groupby() and a lambda function for Multiple Keys Grouped Summation
In this method, we first sorts the input list using the sorted() function and a lambda function that extracts the grouping indices. It then uses itertools.groupby() to group the sorted list by the same indices. Finally, it uses a list comprehension to iterate over each group, summing the values of the sum_idx index for each element in the group, and creating a new tuple that includes the grouping indices and the summed value.
Python3
from itertools import groupby
test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ),
( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ),
( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )]
print ( "The original list is : " + str (test_list))
grp_indx = [ 1 , 2 ]
sum_idx = [ 0 ]
res = [(key[ 0 ], key[ 1 ], sum (sub[ 0 ] for sub in group))
for key, group in groupby( sorted (test_list, key = lambda x: (x[grp_indx[ 0 ]], x[grp_indx[ 1 ]])),
key = lambda x: (x[grp_indx[ 0 ]], x[grp_indx[ 1 ]]))]
print ( "The grouped summation : " + str (res))
|
Output
The original list is : [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) because of the sorting operation. The groupby function itself has a time complexity of O(n).
Auxiliary space: O(n).
Method 3: Using pandas library
Pandas is a powerful library in Python for data manipulation and analysis. It has a groupby function that can be used to group data by one or more keys and perform operations on the grouped data.
Python3
import pandas as pd
test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ),
( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ),
( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )]
df = pd.DataFrame(test_list, columns = [ 'value' , 'key1' , 'key2' ])
grouped = df.groupby([ 'key1' , 'key2' ])[ 'value' ]. sum ()
res = [(key[ 0 ], key[ 1 ], value) for key, value in grouped.items()]
print ( "The grouped summation : " + str (res))
|
OUTPUT-
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) because of the sorting operation performed internally by pandas for grouping the data.
Auxiliary space: O(n) because pandas needs to create a DataFrame object to store the input data and perform the grouping operation.
Method 4: Using itertools.groupby() and operator.itemgetter()
Use the itertools.groupby() function and the operator.itemgetter() function to group the elements by their keys and sum the values.
Python3
import itertools
import operator
test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ),
( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ),
( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )]
grp_indx = [ 1 , 2 ]
sum_idx = [ 0 ]
test_list.sort(key = operator.itemgetter( * grp_indx))
res = []
for k, g in itertools.groupby(test_list, key = operator.itemgetter( * grp_indx)):
vals = [sub[sum_idx[ 0 ]] for sub in g]
res.append(k + ( sum (vals),))
print ( "The grouped summation : " + str (res))
|
Output
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) due to the sorting of the input list using the sorted() function.
Auxiliary space: O(n) because the result list res and the temporary list vals both have a maximum size of n, where n is the number of elements in the input list.
Method 5: Using dictionary comprehension
- Initialize the input list, grouping indices, and sum index.
- Create a dictionary comprehension to initialize a dictionary with keys as tuples of grouping indices and values as 0.
- Traverse through each sub-list in the input list, and update the corresponding key value in the dictionary by adding the value at the sum index to the existing value.
- Convert the dictionary to a list of tuples where each tuple contains the grouping indices followed by the sum.
- Print the result.
Python3
test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ),
( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )]
grp_indx = [ 1 , 2 ]
sum_idx = [ 0 ]
temp = {(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]]): 0 for sub in test_list}
for sub in test_list:
temp[(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]])] + = sub[sum_idx[ 0 ]]
res = [key + (val,) for key, val in temp.items()]
print ( "The grouped summation: " + str (res))
|
Output
The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]
Time complexity: O(n). Where n is the length of the dictionary.
Auxiliary Space: O(m), where m is the number of unique combinations of grouping indices.
Method 6: Using the built-in function reduce() from the functools module
reduce() is a function from the functools module in Python that applies a function of two arguments cumulatively on a sequence of elements, in this case, our list of tuples.
Approach:
- Import the functools module
- Initialize grp_indx and sum_idx variables as before
- Define a lambda function that takes two tuples as arguments and returns a tuple with the same first two elements and the sum of their third elements. This function will be used by reduce() to perform the grouped summation.
- Use reduce() to apply the lambda function on the list of tuples. The initial value passed to reduce() is an empty dictionary.
- Convert the resulting dictionary to a list of tuples, where each tuple has the same first two elements as the keys of the dictionary and the third element is the value of the corresponding key.
- Print the result.
Below is the implementation of the above approach:
Python3
from functools import reduce
test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ),
( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )]
grp_indx = [ 1 , 2 ]
sum_idx = [ 0 ]
res_dict = reduce ( lambda d, t: { * * d, (t[grp_indx[ 0 ]], t[grp_indx[ 1 ]]): d.get((t[grp_indx[ 0 ]], t[grp_indx[ 1 ]]), 0 ) + t[sum_idx[ 0 ]]}, test_list, {})
res = [(k[ 0 ], k[ 1 ], v) for k, v in res_dict.items()]
print ( "The grouped summation: " + str (res))
|
Output
The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]
Time Complexity: O(nlogn) due to the use of reduce() which has a time complexity of O(n) and the time complexity of the lambda function which is O(logn).
Auxiliary Space: O(n) because of the use of a dictionary to store intermediate results.
Method 7: Using NumPy
Steps:
- First, we import the NumPy library.
- We initialize the input list (test_list) and the grouping and sum indices (grp_indx and sum_indx, respectively).
- We convert the input list to a NumPy array using np.array().
- We extract the grouping and sum indices as separate arrays using array slicing (arr[:, grp_indx] and arr[:, sum_idx], respectively).\
- We convert the sum_arr to a numeric data type (such as int) using the astype() method, so that we can perform summation on it later.
- We use the np.unique() function to find the unique combinations of the grouping indices (grp_arr) and store them in unique_groups.
- We iterate over the unique combinations of grouping indices using a for loop.
- For each unique combination, we calculate the grouped summation by using the np.all() function to compare the grp_arr with the current group, and then summing the corresponding values in sum_arr.
- We append the results as tuples to a list called result.
- Finally, we print the result.
Python3
import numpy as np
test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ),
( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )]
grp_indx = [ 1 , 2 ]
sum_idx = [ 0 ]
arr = np.array(test_list)
grp_arr = arr[:, grp_indx]
sum_arr = arr[:, sum_idx].astype( int )
unique_groups = np.unique(grp_arr, axis = 0 )
result = []
for group in unique_groups:
group_sum = np. sum (sum_arr[np. all (grp_arr = = group, axis = 1 )])
result.append((group[ 0 ], group[ 1 ], group_sum))
print ( "The grouped summation: " + str (result))
|
OUTPUT :
The grouped summation: [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(NlogN) for np.unique(), where N is the number of elements in test_list, and O(N) for the for loop.
Auxiliary Space: O(N) for the NumPy arrays and O(N) for the result list.
Last Updated :
08 May, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...