Python – Cross tuple summation grouping

Last Updated : 16 May, 2023

Sometimes, while working with Python tuple records, we can have a problem in which we need to perform summation grouping of 1st element of tuple pair w.r.t similar 2nd element of tuple. This kind of problem can have application in day-day programming. Let’s discuss certain ways in which this task can be performed.

Input : test_list = [(1, 5), (7, 4), (9, 6), (11, 6)]
Output : [(7, 4), (1, 5), (20, 6)]

Input : test_list = [(1, 8)]
Output : [(1, 8)]

Method #1: Using loop This is brute force way in which this task can be performed. In this we check for similar second elements and perform summation till then and perform the accumulated grouping.

Python3

# Python3 code to demonstrate working of 
# Cross tuple summation grouping
# Using loop
 
# initializing list
test_list = [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Concatenate Similar Key values
# Using loop
temp = dict()
for ele1, ele2 in test_list:
    temp[ele2] = temp.get(ele2, 0) + ele1
res = [(ele2, ele1) for (ele1, ele2) in temp.items()]
 
# printing result 
print("The grouped records are : " + str(res)) 

Output :

The original list is : [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
The grouped records are : [(10, 4), (11, 5), (18, 6), (9, 7)]

Time complexity: O(n), where n is the length of the input list test_list. This is because the program uses a loop to iterate over each element in the input list once, and then a loop to iterate over the key-value pairs in the dictionary once.
Auxiliary space: O(n), where n is the length of the input list test_list. This is because the program uses a dictionary to store key-value pairs, and the size of the dictionary can grow up to n.

Method #2: Using groupby() + sum() + zip() + list comprehension The combination of above functions can be used to solve this problem. In this, we perform the task of finding sum using sum(). The task of grouping is done by groupby().

Python3

# Python3 code to demonstrate working of 
# Cross tuple summation grouping
# Using groupby() + sum() + zip() + list comprehension
from itertools import groupby
 
# initializing list
test_list = [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Concatenate Similar Key values
# Using groupby() + sum() + zip() + list comprehension
res = [(sum(next(zip(*ele))), key) for key, ele in groupby(
                       test_list, key = lambda tup:tup[1])]
 
# printing result 
print("The grouped records are : " + str(res)) 

Output :

The original list is : [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
The grouped records are : [(10, 4), (11, 5), (18, 6), (9, 7)]

Time complexity: O(nlogn), where n is the length of the input list test_list.
Auxiliary space complexity: O(n), where n is the length of the input list test_list.

Method 3: Using defaultdict

The provided code uses defaultdict to create a dictionary where the default value is 0, loops through the list and adds up the values with the same key, and finally creates a list of tuples where the first element is the sum of the values and the second element is the key.

Python3

from collections import defaultdict
 
# initializing list
test_list = [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Using defaultdict to create a dictionary where the default value is 0
grouped_dict = defaultdict(int)
 
# Looping through the list and adding up the values with the same key
for tup in test_list:
    grouped_dict[tup[1]] += tup[0]
 
# Creating a list of tuples where the first element is the sum of the values
# and the second element is the key
res = [(val, key) for key, val in grouped_dict.items()]
 
# printing result
print("The grouped records are : " + str(res))

Output

The original list is : [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
The grouped records are : [(11, 5), (18, 6), (10, 4), (9, 7)]

Time complexity: O(n), where n is the length of the input list, since it loops through the list once and performs constant time operations inside the loop.
Auxiliary space: O(n), since it creates a dictionary with one key-value pair for each distinct key in the input list.

Method #3: Using defaultdict and list comprehension

This program uses defaultdict to group a list of tuples based on their second element and sum the first element values for each group. The output is a list of tuples with each tuple containing a second element and its corresponding sum of first element values.

Python3

from collections import defaultdict
 
# initializing list
test_list = [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Grouping tuples based on second element and summing first element
temp = defaultdict(int)
for ele1, ele2 in test_list:
    temp[ele2] += ele1
res = [(k, v) for k, v in temp.items()]
 
# printing result 
print("The grouped records are : " + str(res))

Output

The original list is : [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
The grouped records are : [(5, 11), (6, 18), (4, 10), (7, 9)]

The time complexity of the code is O(n), where n is the length of the input list.
The auxiliary space of the code is O(m), where m is the number of distinct second elements in the input list.

Method 5: using the Pandas library.

First, the pandas library is imported with the alias pd.
A list of tuples named test_list is initialized, which contains 7 tuples, each with two elements.
A pandas DataFrame is created from test_list using the pd.DataFrame() method, where the column names ‘col1’ and ‘col2’ are assigned to the first and second elements of each tuple, respectively.
The DataFrame is grouped by the values in ‘col2’ using the groupby() method, and the sum of values in ‘col1’ for each unique value in ‘col2’ is calculated using the sum() method.
The reset_index() method is used to reset the index of the resulting DataFrame, and the values attribute is used to convert it to a numpy array.
The tolist() method is used to convert the numpy array to a list.
The resulting list is stored in the variable res.
Finally, the result is printed to the console using the print() function, where the list res is converted to a string and concatenated with the text “The grouped records are : “.

Python3

import pandas as pd
 
# initializing list
test_list = [(4, 5), (7, 5), (8, 6), (10, 6), (10, 4), (6, 7), (3, 7)]
 
# convert list to pandas dataframe
df = pd.DataFrame(test_list, columns=['col1', 'col2'])
 
# groupby and sum
res = df.groupby('col2')['col1'].sum().reset_index().values.tolist()
 
# printing result
print("The grouped records are : " + str(res))

OUTPUT : 
The grouped records are : [[4, 10], [5, 11], [6, 18], [7, 9]]

Time complexity: O(nlogn) due to the overhead of converting list to dataframe and the groupby operation.
Auxiliary space: O(n) for the pandas dataframe.