Python – Group Records on Similar index elements

Last Updated : 28 Apr, 2023

Sometimes, while working with Python records, we can have a problem in which we need to perform grouping of particular index of tuple, on basis of similarity of rest of indices. This type of problems can have possible applications in Web development domain. Let’s discuss certain ways in which this task can be performed.

Input : test_list = [(4, 5, 10), (4, 5, 7), (4, 5, 12)]
Output : ((4, 5, [10, 7, 12]), )

Input : test_list = [(4, 5, 10)]
Output : [(4, 5, [10])]

Method #1 : Using defaultdict() + loop

The combination of above functions offer a brute approach to solve this problem. In this, we group elements and club into one using defaultdict() on basis of equality of rest of elements.

Python3

# Python3 code to demonstrate working of 
# Group Records on Similar index elements
# Using defaultdict() + loop
from collections import defaultdict
 
# initializing list
test_list = [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Group Records on Similar index elements
# Using defaultdict() + loop
temp = defaultdict(list)
for a, b, c in test_list:
    temp[a, b].append(c)
res = tuple((*key, val) for key, val in temp.items())
 
# printing result 
print("Tuples after grouping : " + str(res)) 

Output :

The original list is : [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
Tuples after grouping : ((4, 7, [13]), (4, 5, [7, 15]), (6, 7, [10, 12]))

Time Complexity: O(n*n) where n is the number of elements in the list “test_list”. defaultdict() + loop performs n*n number of operations.
Auxiliary Space: O(n), extra space is required where n is the number of elements in the list

Method #2 : Using groupby() + generator expression

The combinations of above functions can be used to solve this problem. In this, we perform the grouping of all elements using groupby() and generation of tuple list using generator expression. Needs sorted list for this method.

Python3

# Python3 code to demonstrate working of 
# Group Records on Similar index elements
# Using groupby() + generator expression
from itertools import groupby
 
# initializing list
test_list = [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Group Records on Similar index elements
# Using groupby() + generator expression
test_list.sort()
res = tuple((*key, [tup[-1] for tup in val]) for key, val in groupby(test_list, lambda tup: tup[:2]))
 
# printing result 
print("Tuples after grouping : " + str(res)) 

Output :

The original list is : [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
Tuples after grouping : ((4, 7, [13]), (4, 5, [7, 15]), (6, 7, [10, 12]))

Method 3: Using dictionary comprehension and setdefault()

Initialize an empty dictionary temp to hold our grouped records.
Loop through the elements of test_list. For each tuple (a, b, c), we check if the key (a, b) exists in the dictionary temp. If it does, we append c to the value list associated with the key. If it doesn’t, we create a new key (a, b) and assign a new list containing c as its value.
Use dictionary comprehension to create a list of tuples res. For each key-value pair in temp.items(), we extract the first and second elements of the key and the value list, and create a new tuple with these elements.
Print the resulting list of tuples.

Python

# Python3 code to demonstrate working of 
# Group Records on Similar index elements
# Using dictionary comprehension and setdefault()
 
# initializing list
test_list = [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Group Records on Similar index elements
# Using dictionary comprehension and setdefault()
temp = {}
for a, b, c in test_list:
    temp.setdefault((a, b), []).append(c)
res = [(k[0], k[1], v) for k, v in temp.items()]
 
# printing result 
print("Tuples after grouping : " + str(res))

Output

The original list is : [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
Tuples after grouping : [(4, 5, [7, 15]), (4, 7, [13]), (6, 7, [10, 12])]

Time complexity: O(n), where n is the length of test_list. We loop through the list once, and each dictionary operation (checking for a key and appending a value) takes constant time.

Auxiliary space: O(m), where m is the number of distinct key tuples in test_list. We store a dictionary with at most m key-value pairs, and a list of at most n elements in the worst case.

Method #4: Using itertools.groupby() + lambda function

Approach:

Import the itertools module.
Sort the list by the first two elements of each tuple.
Use itertools.groupby() function to group the tuples by the first two elements.
Use a lambda function to extract the third element from each grouped tuple.
Create a list of tuples, where the first two elements are the key and the third element is a list of values.

Below is the implementation of the above approach:

Python3

# Python3 code to demonstrate working of 
# Group Records on Similar index elements
# Using itertools.groupby() + lambda function
import itertools
 
# initializing list
test_list = [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Group Records on Similar index elements
# Using itertools.groupby() + lambda function
test_list.sort(key=lambda x: (x[0], x[1]))
res = [(key[0], key[1], [val[2] for val in group]) for key, group in itertools.groupby(test_list, lambda x: (x[0], x[1]))]
 
# printing result
print("Tuples after grouping : " + str(res))

Output

The original list is : [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
Tuples after grouping : [(4, 5, [7, 15]), (4, 7, [13]), (6, 7, [10, 12])]

Time complexity: O(n log n) (due to the sorting operation)
Auxiliary space: O(n) (to store the result list)

Method #5: Using defaultdict() + list comprehension

Python3

from collections import defaultdict
 
# Initializing list
test_list = [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
 
# Printing original list
print("The original list is : " + str(test_list))
 
# Grouping Records on Similar index elements
# Using defaultdict() + list comprehension
d = defaultdict(list)
 
for x in test_list:
    d[(x[0], x[1])].append(x[2])
 
res = [(k[0], k[1], v) for k, v in d.items()]
 
# Printing the result
print("Tuples after grouping : " + str(res))

Output

The original list is : [(4, 7, 13), (4, 5, 7), (6, 7, 10), (4, 5, 15), (6, 7, 12)]
Tuples after grouping : [(4, 7, [13]), (4, 5, [7, 15]), (6, 7, [10, 12])]

Time complexity: O(n) since it requires a single pass through the input list.
Auxiliary space: O(n) since it uses a dictionary to store the groups.

Suggest improvement

Python - Group similar elements into Matrix

Share your thoughts in the comments