Python | Categorize the given list by string size
Last Updated :
02 May, 2023
Sometimes, we have a use case in which we need to perform the grouping of strings by various factors, like first letter or any other factor. These type of problems are typical to database queries and hence can occur in web development while programming. This article focuses on one such grouping by size of string. Let’s discuss certain ways in which this can be performed.
Method #1 : Using next() + lambda + loop The combination of above 3 functions is used to solve this particular problem by the naive method. The lambda function performs the task of finding like lengths, and next function helps in forward iteration.
Python3
test_list = [ 'man' , 'a' , 'geek' , 'for' , 'b' , 'free' ]
print ( "The original list : " + str (test_list))
def util_func(x, y): return len (x) = = len (y)
res = []
for sub in test_list:
ele = next ((x for x in res if util_func(sub, x[ 0 ])), [])
if ele = = []:
res.append(ele)
ele.append(sub)
print ( "The list after Categorization : " + str (res))
|
Output :
The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['man', 'for'], ['a', 'b'], ['geek', 'free']]
Time Complexity: O(n), where n is the number of elements in the list “test_list”.
Auxiliary Space: O(n), where n is the number of elements in the list “test_list”.
Method #2: Using sorted() + groupby()
This particular task can also be solved using the groupby function which offers a convenient method to solve this problem. The sorted function sorts the elements by size to be feed to groupby for the relevant grouping.
Python3
from itertools import groupby
test_list = [ 'man' , 'a' , 'geek' , 'for' , 'b' , 'free' ]
print ( "The original list : " + str (test_list))
def util_func(x): return len (x)
temp = sorted (test_list, key = util_func)
res = [ list (ele) for i, ele in groupby(temp, util_func)]
print ( "The list after Categorization : " + str (res))
|
Output :
The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['man', 'for'], ['geek', 'free']]
Time Complexity: O(nlogn) where n is the number of elements in the list “test_list”. sorted() + groupby() performs nlogn number of operations.
Auxiliary Space: O(n), extra space is required where n is the number of elements in the list
Method #3 : Using for loops, sort() and len() methods
Python3
test_list = [ 'man' , 'a' , 'geek' , 'for' , 'b' , 'free' ]
print ( "The original list : " + str (test_list))
x = []
for i in test_list:
if len (i) not in x:
x.append( len (i))
x.sort()
res = []
for i in x:
b = []
for j in test_list:
if ( len (j) = = i):
b.append(j)
res.append(b)
print ( "The list after Categorization : " + str (res))
|
Output
The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['man', 'for'], ['geek', 'free']]
Method #4 : Using collections.defaultdict
The collections.defaultdict is a subclass of the built-in Python dict class that allows you to specify a default value for a dictionary key that does not exist. This can be useful when you want to group items in a list by some criteria and you want to create a new group for each unique criteria.
For example, to group the strings in a list by their size using a defaultdict, you could do the following:
Python3
from collections import defaultdict
string_list = [ 'man' , 'a' , 'geek' , 'for' , 'b' , 'free' ]
print ( "The original list:" , string_list)
groups = defaultdict( list )
for s in string_list:
groups[ len (s)].append(s)
result = list (groups.values())
print ( "The list after categorization:" , result)
|
Output
The original list: ['man', 'a', 'geek', 'for', 'b', 'free']
The list after categorization: [['man', 'for'], ['a', 'b'], ['geek', 'free']]
Time complexity: O(n)
Auxiliary Space: O(n)
Method #5 : Using numpy:
- Initialize the input list test_list.
- Convert the list to a numpy array using numpy.array() function.
- Use numpy.unique() function to get the unique string lengths in the array.
- Create an empty dictionary res to store the categorized strings. The keys of the dictionary are the unique
- string lengths, and the values are empty lists.
- Iterate over the strings in the numpy array using a for loop.
- For each string, append it to the corresponding list in the res dictionary based on its length.
- Convert the res dictionary to a list and sort each sublist in ascending order using sorted() function.
- Print the categorized list.
Python3
import numpy as np
test_list = [ 'man' , 'a' , 'geek' , 'for' , 'b' , 'free' ]
print ( "The original list : " + str (test_list))
arr = np.array(test_list)
lengths = np.unique([ len (s) for s in arr])
res = {l: [] for l in lengths}
for s in arr:
res[ len (s)].append(s)
res = [ sorted (res[l]) for l in lengths]
print ( "The list after Categorization : " + str (res))
|
Output:
The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['for', 'man'], ['free', 'geek']]
The time complexity : O(n log n), where n is the length of the input list test_list. This is because we are using the sorted() function to sort each sublist, and its time complexity is O(m log m) where m is the length of the sublist. Since the maximum value of m is n, the total time complexity is O(n log n).
The auxiliary space : O(n), where n is the length of the input list test_list. This is because we are creating a numpy array and a dictionary with n elements each. The additional memory used by the other variables and function calls is negligible.
Share your thoughts in the comments
Please Login to comment...