Skip to content
Related Articles
Open in App
Not now

Related Articles

Python | Grouping similar substrings in list

Improve Article
Save Article
  • Last Updated : 02 Jan, 2023
Improve Article
Save Article

Sometimes we have an application in which we require to group common prefix strings into one such that further processing can be done according to the grouping. This type of grouping is useful in the cases of Machine Learning and Web Development. Let’s discuss certain ways in which this can be done.
Method #1 : Using lambda + itertools.groupby() + split() 
The combination of above three functions help us achieve the task. The split method is key as it defines the separator by which grouping has to be performed. The groupby function does the grouping of elements.
 

Python3




# Python3 code to demonstrate
# group similar substrings
# using lambda + itertools.groupby() + split()
from itertools import groupby
 
# initializing list
test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
 
# sort list
# essential for grouping
test_list.sort()
 
# printing the original list
print ("The original list is : " + str(test_list))
 
# using lambda + itertools.groupby() + split()
# group similar substrings
res = [list(i) for j, i in groupby(test_list,
                  lambda a: a.split('_')[0])]
 
# printing result
print ("The grouped list is : " + str(res))

Output

The original list is : ['coder_2', 'coder_3', 'geek_1', 'geek_4', 'pro_3']
The grouped list is : [['coder_2', 'coder_3'], ['geek_1', 'geek_4'], ['pro_3']]

 
Method #2 : Using lambda + itertools.groupby() + partition() 
The similar task can also be performed replacing the split function with the partition function. This is more efficient way to perform this task as it uses the iterators and hence internally quicker.
 

Python3




# Python3 code to demonstrate
# group similar substrings
# using lambda + itertools.groupby() + partition()
from itertools import groupby
 
# initializing list
test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
 
# sort list
# essential for grouping
test_list.sort()
 
# printing the original list
print ("The original list is : " + str(test_list))
 
# using lambda + itertools.groupby() + partition()
# group similar substrings
res = [list(i) for j, i in groupby(test_list,
              lambda a: a.partition('_')[0])]
 
# printing result
print ("The grouped list is : " + str(res))

Output

The original list is : ['coder_2', 'coder_3', 'geek_1', 'geek_4', 'pro_3']
The grouped list is : [['coder_2', 'coder_3'], ['geek_1', 'geek_4'], ['pro_3']]

Method #3 : Using index() and find() methods

Python3




# Python3 code to demonstrate
# group similar substrings
 
# initializing list
test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
print("The original List is : "+ str(test_list))
x=[]
for i in test_list:
    x.append(i[:i.index("_")])
x=list(set(x))
res=[]
for i in x:
    a=[]
    for j in test_list:
        if(j.find(i)!=-1):
            a.append(j)
    res.append(a)
 
# printing result
print ("The grouped list is : " + str(res))

Output

The grouped list is : [['coder_2', 'coder_3'], ['pro_3'], ['geek_1', 'geek_4']]

Method #4 : Using startswith()

Python3




# initializing list
test_list = ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
 
# printing the original list
print("The original list is : " + str(test_list))
 
# using startswith in a list comprehension
res = [[item for item in test_list if item.startswith(prefix)] for prefix in set([item[:item.index("_")] for item in test_list])]
 
# printing result
print("The grouped list is : " + str(res))
#This code is contributed by Edula Vinay Kumar Reddy

Output

The original list is : ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
The grouped list is : [['coder_2', 'coder_3'], ['pro_3'], ['geek_1', 'geek_4']]

This approach first creates a list of all the unique prefixes in the original list using a list comprehension and the set function. It then uses another list comprehension to create a list of lists, where each inner list contains all the elements in the original list that start with the corresponding prefix.

This approach is more concise and readable than the third method using index and find, and it is also more efficient than the first and second methods using lambda, itertools.groupby, and either split or partition

This approach has a time complexity of O(n), as it iterates through the list test_list twice, once to create the list of unique prefixes and once to create the grouped list. It also has a space complexity of O(n), as it creates two additional lists, one containing the unique prefixes and one containing the grouped list.

This means that the time and space complexity of this approach are linear with respect to the size of the input list test_list.


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!