Python | Grouping similar substrings in list

Sometimes we have an application in which we require to group common prefix strings into one such that further processing can be done according to the grouping. This type of grouping is useful in the cases of Machine Learning and Web Development. Let’s discuss certain ways in which this can be done.

Method #1 : Using lambda + itertools.groupby() + split()
The combination of above three functions help us achieve the task. The split method is key as it defines the seperator by which grouping has to be performed. The groupby function does the grouping of elements.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python3 code to demonstrate
# group similar substrings
# using lambda + itertools.groupby() + split()
from itertools import groupby
  
# initializing list 
test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
  
# sort list 
# essential for grouping
test_list.sort()
  
# printing the original list 
print ("The original list is : " + str(test_list))
  
# using lambda + itertools.groupby() + split()
# group similar substrings
res = [list(i) for j, i in groupby(test_list,
                  lambda a: a.split('_')[0])]
  
# printing result
print ("The grouped list is : " + str(res))

chevron_right


Output :

The original list is : [‘coder_2’, ‘coder_3’, ‘geek_1’, ‘geek_4’, ‘pro_3’]
The grouped list is : [[‘coder_2’, ‘coder_3’], [‘geek_1’, ‘geek_4’], [‘pro_3’]]

 

Method #2 : Using lambda + itertools.groupby() + partition()
The similar task can also be performed replacing the split function with the partition function. This is more efficient way to perform this task as it uses the iterators and hence internally quicker.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python3 code to demonstrate
# group similar substrings
# using lambda + itertools.groupby() + partition()
from itertools import groupby
  
# initializing list 
test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
  
# sort list 
# essential for grouping
test_list.sort()
  
# printing the original list 
print ("The original list is : " + str(test_list))
  
# using lambda + itertools.groupby() + partition()
# group similar substrings
res = [list(i) for j, i in groupby(test_list,
              lambda a: a.partition('_')[0])]
  
# printing result
print ("The grouped list is : " + str(res))

chevron_right


Output :

The original list is : [‘coder_2’, ‘coder_3’, ‘geek_1’, ‘geek_4’, ‘pro_3’]
The grouped list is : [[‘coder_2’, ‘coder_3’], [‘geek_1’, ‘geek_4’], [‘pro_3’]]



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.