Sometimes we have an application in which we require to group common prefix strings into one such that further processing can be done according to the grouping. This type of grouping is useful in the cases of Machine Learning and Web Development. Let’s discuss certain ways in which this can be done.
Method #1 : Using lambda + itertools.groupby() + split()
The combination of above three functions help us achieve the task. The split method is key as it defines the separator by which grouping has to be performed. The groupby function does the grouping of elements.
Steps by step approach:
- Import the groupby function from the itertools module.
- Initialize a list of strings test_list with some elements.
- Sort the test_list in ascending order using the sort() method. This is necessary for grouping later.
- Print the original test_list.
- Use a list comprehension to iterate over the groups of elements in test_list grouped by the first substring before the _ character.
- In the groupby() function, test_list is iterable, and the lambda function lambda a: a.split(‘_’)[0] returns the first substring before the _ character in each element of the list. This is used to group the elements.
- Convert each group into a list and append it to the result list res.
- Print the result list res.
Below is the implementation of the above approach:
Python3
from itertools import groupby
test_list = [ 'geek_1' , 'coder_2' , 'geek_4' , 'coder_3' , 'pro_3' ]
test_list.sort()
print ( "The original list is : " + str (test_list))
res = [ list (i) for j, i in groupby(test_list,
lambda a: a.split( '_' )[ 0 ])]
print ( "The grouped list is : " + str (res))
|
Output
The original list is : ['coder_2', 'coder_3', 'geek_1', 'geek_4', 'pro_3']
The grouped list is : [['coder_2', 'coder_3'], ['geek_1', 'geek_4'], ['pro_3']]
Time complexity: O(nlogn), where n is the length of the input list.
Auxiliary space: O(n), where n is the length of the input list.
Method #2 : Using lambda + itertools.groupby() + partition()
The similar task can also be performed replacing the split function with the partition function. This is more efficient way to perform this task as it uses the iterators and hence internally quicker.
Python3
from itertools import groupby
test_list = [ 'geek_1' , 'coder_2' , 'geek_4' , 'coder_3' , 'pro_3' ]
test_list.sort()
print ( "The original list is : " + str (test_list))
res = [ list (i) for j, i in groupby(test_list,
lambda a: a.partition( '_' )[ 0 ])]
print ( "The grouped list is : " + str (res))
|
Output
The original list is : ['coder_2', 'coder_3', 'geek_1', 'geek_4', 'pro_3']
The grouped list is : [['coder_2', 'coder_3'], ['geek_1', 'geek_4'], ['pro_3']]
Time complexity: O(n log n) (due to sorting the list).
Auxiliary space: O(n) (for creating the result list “res”).
Method #3 : Using index() and find() methods
Python3
test_list = [ 'geek_1' , 'coder_2' , 'geek_4' , 'coder_3' , 'pro_3' ]
print ( "The original List is : " + str (test_list))
x = []
for i in test_list:
x.append(i[:i.index( "_" )])
x = list ( set (x))
res = []
for i in x:
a = []
for j in test_list:
if (j.find(i)! = - 1 ):
a.append(j)
res.append(a)
print ( "The grouped list is : " + str (res))
|
Output
The grouped list is : [['coder_2', 'coder_3'], ['pro_3'], ['geek_1', 'geek_4']]
Time complexity: O(n^2), where ‘n’ is the length of the input list ‘test_list’.
Auxiliary space: O(n), where ‘n’ is the length of the input list ‘test_list’.
Method #4 : Using startswith()
Python3
test_list = [ 'geek_1' , 'coder_2' , 'geek_4' , 'coder_3' , 'pro_3' ]
print ( "The original list is : " + str (test_list))
res = [[item for item in test_list if item.startswith(prefix)] for prefix in set ([item[:item.index( "_" )] for item in test_list])]
print ( "The grouped list is : " + str (res))
|
Output
The original list is : ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
The grouped list is : [['coder_2', 'coder_3'], ['pro_3'], ['geek_1', 'geek_4']]
Time Complexity: O(n), as it iterates through the list test_list twice, once to create the list of unique prefixes and once to create the grouped list. It also has a space complexity of O(n), as it creates two additional lists, one containing the unique prefixes and one containing the grouped list.
Auxiliary Space: O(n)
Method #5: Using a dictionary to group similar substrings
Use a dictionary to group the substrings that have the same prefix. The keys of the dictionary will be the prefixes, and the values will be lists containing the substrings with that prefix. Here’s an example implementation:
Python3
test_list = [ 'geek_1' , 'coder_2' , 'geek_4' , 'coder_3' , 'pro_3' ]
grouped = {}
for s in test_list:
prefix = s.split( '_' )[ 0 ]
if prefix not in grouped:
grouped[prefix] = []
grouped[prefix].append(s)
res = list (grouped.values())
print (res)
|
Output
[['geek_1', 'geek_4'], ['coder_2', 'coder_3'], ['pro_3']]
Time complexity: O(n*k), where n is the length of the input list and k is the maximum length of the prefix.
Auxiliary space: O(n*k), as the dictionary may contain all n elements of the input list, and the length of each value list may be up to n.
Method #6: Using a loop and a dictionary
Step-by-step approach:
- Initialize the list of strings.
- Create an empty dictionary to store the groups.
- Iterate over each string in the list.
- Extract the substring before the underscore using the split() method.
- Check if the key exists in the dictionary. If it does, append the string to the list under the key. If it doesn’t, create a new list with the string under the key.
- Convert the dictionary to a list of lists using the values() method.
- Print the original list and the grouped list.
Python3
test_list = [ 'geek_1' , 'coder_2' , 'geek_4' , 'coder_3' , 'pro_3' ]
d = {}
for s in test_list:
key = s.split( '_' )[ 0 ]
if key in d:
d[key].append(s)
else :
d[key] = [s]
res = list (d.values())
print ( "The original list is : " + str (test_list))
print ( "The grouped list is : " + str (res))
|
Output
The original list is : ['geek_1', 'coder_2', 'geek_4', 'coder_3', 'pro_3']
The grouped list is : [['geek_1', 'geek_4'], ['coder_2', 'coder_3'], ['pro_3']]
Time complexity: This approach has a time complexity of O(n), where n is the number of strings in the list. The loop iterates over each string in the list once, and the time complexity of dictionary operations is usually considered to be constant time.
Auxiliary space: This approach uses a dictionary to store the groups, so the auxiliary space complexity is O(k*n), where k is the average size of the groups and n is the number of strings in the list.
Method #7: Using numpy method:
Algorithm :
- Initialize the input list test_list.
- Get the unique prefixes from the input list using np.unique.
- Group the elements in test_list by prefix using a list comprehension.
- Print the grouped list res.
Python3
import numpy as np
test_list = [ 'geek_1' , 'coder_2' , 'geek_4' , 'coder_3' , 'pro_3' ]
print ( "The original list is : " + str (test_list))
prefixes = np.unique([item.split( '_' )[ 0 ] for item in test_list])
res = [[item for item in test_list if item.startswith(prefix)] for prefix in prefixes]
print ( "The grouped list is : " + str (res))
|
Output:
The original list is : [‘geek_1’, ‘coder_2’, ‘geek_4’, ‘coder_3’, ‘pro_3’]
The grouped list is : [[‘coder_2’, ‘coder_3’], [‘geek_1’, ‘geek_4’], [‘pro_3’]]
Time complexity:
The np.unique function has a time complexity of O(n log n) or O(n) depending on the implementation used.
The list comprehension inside the res list has a time complexity of O(n^2), where n is the length of the input list.
Therefore, the overall time complexity of the algorithm is O(n^2).
Auxiliary Space:
The space complexity of the algorithm is O(n) because we store the input list, the prefixes, and the grouped list in memory.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
11 Apr, 2023
Like Article
Save Article