Open In App

Python – Group records by Kth column in List

Sometimes, while working with Python lists, we can have a problem in which we need to perform grouping of records on basis of certain parameters. One such parameters can be on the Kth element of Tuple. Lets discuss certain ways in which this task can be performed. 

Method #1 : Using loop + defaultdict() The combination of above methods can be used to perform this task. In this we store the tuples in different list on basis of Kth Column using defaultdict and iteration using loop. 






# Python3 code to demonstrate
# Group records by Kth column in List
# using loop + defaultdict()
from collections import defaultdict
 
# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Initializing K
K = 0
 
# Group records by Kth column in List
# using loop + defaultdict()
temp = defaultdict(list)
for ele in test_list:
    temp[ele[K]].append(ele)
res = list(temp.values())
 
# printing result
print ("The list after grouping : " + str(res))

Output : 
The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping : [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

Time Complexity: O(n*n) where n is the number of elements in the list “test_list”. 
Auxiliary Space: O(n) where n is the number of elements in the list “test_list”. 



Method #2 : Using itemgetter() + groupby() + list comprehension The combination of above function can also be performed using above functions. In this, itemgetter is used to select Kth Column, groupby() is used to group and list comprehension is used to compile the result. 




# Python3 code to demonstrate
# Group records by Kth column in List
# using itemgetter() + list comprehension + groupby()
from operator import itemgetter
from itertools import groupby
 
# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Initializing K
K = 0
 
# Group records by Kth column in List
# using loop + defaultdict()
temp = itemgetter(K)
res = [list(val) for key, val in groupby(sorted(test_list, key = temp), temp)]
 
# printing result
print ("The list after grouping : " + str(res))

Output : 
The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping : [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

The time complexity of the code is O(nlogn), where n is the length of the input list.
The space complexity of the code is O(n), where n is the length of the input list.

Method #3 : Using numpy

One more approach to perform the grouping of records based on the Kth column in a list is using the numpy library.

Here’s how it can be done:




import numpy as np
 
# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Initializing K
K = 0
 
# Group records by Kth column in List using numpy
arr = np.array(test_list)
keys, indices, inverse = np.unique(arr[:, K], return_index=True, return_inverse=True)
res = [arr[np.where(inverse == i)].tolist() for i in range(len(keys))]
 
# printing result
print("The list after grouping : " + str(res))
#This code is contributed by Edula Vinay Kumar Reddy

Output:

The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping : [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

Time Complexity: O(NlogN)
Space Complexity: O(N)

Method #4: Using a list comprehension with enumerate() function and a set:




# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Initializing K
K = 0
 
# Group records by Kth column in List
# using enumerate() function
 
result = [[] for i in range(len(set([x[0] for x in test_list])))]
for i, (key, value) in enumerate(test_list):
    result[list(set([x[0] for x in test_list])).index(key)].append((key, value))
 
# printing result
print("The list after grouping : " + str(result))

Output
The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping : [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

Time Complexity: O(n log n), where n is the length of the input list test_list as the set() operation in the list comprehension takes O(n) time, and the index() method in the for loop takes O(log n) time.

Space Complexity: O(n), as a new list is created of length n.

Method #5: Using setdefault on a dictionary

Step-by-step algorithm:

  1. Initialize the list of tuples containing records.
  2. Initialize the column number K for grouping by that column.
  3. Initialize an empty dictionary called groups.
  4. Iterate each record in the test_list.
    a. Get the value of the Kth column in the current record.
    b. If the key for this value does not exist in the groups dictionary, create a new empty list as the value for that key.
    c. Append the current record to the list for the corresponding key in the groups dictionary.
  5. Convert the dictionary of groups to a list of lists containing the grouped records.
  6. Print the resulting list of lists.




#initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
 
# printing original list
print("The original list is : " + str(test_list))
 
#The column number K is initialized to 0
K = 0
 
#An empty dictionary called groups is initialized
groups = {}
 
#Then on each record in the test_list is iterated over
for x in test_list:
    groups.setdefault(x[K], []).append(x)
groups = list(groups.values())
 
#The resulting list of lists is printed
print("The list after grouping: " + str(groups))

Output
The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping: [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

Time Complexity: O(n), where n is the number of records in the input list. This is because the algorithm iterates over each record in the input list once.
Auxiliary Space: O(n), where n is the number of records in the input list. 

Method #6: Using itertools.groupby()

Step-by-step approach:

Below is the implementation of the above approach:




from itertools import groupby
 
# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
 
# printing original list
print("The original list is: " + str(test_list))
 
# Initializing K
K = 0
 
# Group records by Kth column in List
# using itertools.groupby()
test_list.sort(key=lambda x: x[K])
res = [list(group) for key, group in groupby(test_list, lambda x: x[K])]
 
# printing result
print("The list after grouping: " + str(res))

Output
The original list is: [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping: [[('Gfg', 1), ('Gfg', 3)], [('best', 5)], [('is', 2), ('is', 4)]]

Time complexity: O(n log n), where n is the length of the list.
Auxiliary space: O(n), where n is the length of the list. 


Article Tags :