GeeksforGeeks App
Open App
Browser
Continue

# Python | Categorize the given list by string size

Sometimes, we have a use case in which we need to perform the grouping of strings by various factors, like first letter or any other factor. These type of problems are typical to database queries and hence can occur in web development while programming. This article focuses on one such grouping by size of string. Let’s discuss certain ways in which this can be performed.

Method #1 : Using next() + lambda + loop The combination of above 3 functions is used to solve this particular problem by the naive method. The lambda function performs the task of finding like lengths, and next function helps in forward iteration.

## Python3

 `# Python3 code to demonstrate``# Categorize by string size``# using next() + lambda + loop` `# initializing list``test_list ``=` `[``'man'``, ``'a'``, ``'geek'``, ``'for'``, ``'b'``, ``'free'``]` `# printing original list``print``(``"The original list : "` `+` `str``(test_list))` `# using next() + lambda + loop``# Categorize by string size`  `def` `util_func(x, y): ``return` `len``(x) ``=``=` `len``(y)`  `res ``=` `[]``for` `sub ``in` `test_list:``    ``ele ``=` `next``((x ``for` `x ``in` `res ``if` `util_func(sub, x[``0``])), [])``    ``if` `ele ``=``=` `[]:``        ``res.append(ele)``    ``ele.append(sub)` `# print result``print``(``"The list after Categorization : "` `+` `str``(res))`

Output :

```The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['man', 'for'], ['a', 'b'], ['geek', 'free']]```

Time Complexity: O(n), where n is the number of elements in the list “test_list”.
Auxiliary Space: O(n), where n is the number of elements in the list “test_list”.

Method #2: Using sorted() + groupby()

This particular task can also be solved using the groupby function which offers a convenient method to solve this problem. The sorted function sorts the elements by size to be feed to groupby for the relevant grouping.

## Python3

 `# Python3 code to demonstrate``# Categorize by string size``# using sorted() + groupby()` `from` `itertools ``import` `groupby` `# initializing list``test_list ``=` `[``'man'``, ``'a'``, ``'geek'``, ``'for'``, ``'b'``, ``'free'``]` `# printing original list``print``(``"The original list : "` `+` `str``(test_list))` `# using sorted() + groupby()``# Categorize by string size`  `def` `util_func(x): ``return` `len``(x)`  `temp ``=` `sorted``(test_list, key``=``util_func)``res ``=` `[``list``(ele) ``for` `i, ele ``in` `groupby(temp, util_func)]` `# print result``print``(``"The list after Categorization : "` `+` `str``(res))`

Output :

```The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['man', 'for'], ['geek', 'free']]```

Time Complexity: O(nlogn) where n is the number of elements in the list “test_list”. sorted() + groupby() performs nlogn number of operations.
Auxiliary Space: O(n), extra space is required where n is the number of elements in the list

Method #3 : Using for loops, sort() and len() methods

## Python3

 `# Python3 code to demonstrate``# Categorize by string size` `# initializing list``test_list ``=` `[``'man'``, ``'a'``, ``'geek'``, ``'for'``, ``'b'``, ``'free'``]` `# printing original list``print``(``"The original list : "` `+` `str``(test_list))` `x``=``[]``for` `i ``in` `test_list:``    ``if` `len``(i) ``not` `in` `x:``        ``x.append(``len``(i))``x.sort()``res``=``[]``for` `i ``in` `x:``    ``b``=``[]``    ``for` `j ``in` `test_list:``        ``if``(``len``(j)``=``=``i):``            ``b.append(j)``    ``res.append(b)` `# print result``print``(``"The list after Categorization : "` `+` `str``(res))`

Output

```The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['man', 'for'], ['geek', 'free']]```

Method #4 : Using collections.defaultdict

The collections.defaultdict is a subclass of the built-in Python dict class that allows you to specify a default value for a dictionary key that does not exist. This can be useful when you want to group items in a list by some criteria and you want to create a new group for each unique criteria.

For example, to group the strings in a list by their size using a defaultdict, you could do the following:

## Python3

 `from` `collections ``import` `defaultdict` `# Initialize the list of strings``string_list ``=` `[``'man'``, ``'a'``, ``'geek'``, ``'for'``, ``'b'``, ``'free'``]` `# Print the original list``print``(``"The original list:"``, string_list)` `# Initialize a defaultdict with a default value of an empty list``groups ``=` `defaultdict(``list``)` `# Iterate through the list of strings``for` `s ``in` `string_list:``    ``# Use the length of the string as the key and append the string to the list``    ``# of strings with the same length``    ``groups[``len``(s)].append(s)` `# Convert the defaultdict to a regular dictionary and get the values (lists of strings)``result ``=` `list``(groups.values())` `# Print the result``print``(``"The list after categorization:"``, result)``#This code is contributed by Edula Vinay Kumar Reddy`

Output

```The original list: ['man', 'a', 'geek', 'for', 'b', 'free']
The list after categorization: [['man', 'for'], ['a', 'b'], ['geek', 'free']]```

Time complexity: O(n)

Auxiliary Space: O(n)

Method #5 : Using numpy:

1. Initialize the input list test_list.
2. Convert the list to a numpy array using numpy.array() function.
3. Use numpy.unique() function to get the unique string lengths in the array.
4. Create an empty dictionary res to store the categorized strings. The keys of the dictionary are the unique
5. string lengths, and the values are empty lists.
6. Iterate over the strings in the numpy array using a for loop.
7. For each string, append it to the corresponding list in the res dictionary based on its length.
8. Convert the res dictionary to a list and sort each sublist in ascending order using sorted() function.
9. Print the categorized list.

## Python3

 `import` `numpy as np` `# initializing list``test_list ``=` `[``'man'``, ``'a'``, ``'geek'``, ``'for'``, ``'b'``, ``'free'``]` `# printing original list``print``(``"The original list : "` `+` `str``(test_list))` `# convert list to numpy array``arr ``=` `np.array(test_list)` `# get unique string lengths``lengths ``=` `np.unique([``len``(s) ``for` `s ``in` `arr])` `# create empty dictionary to store categorized strings``res ``=` `{l: [] ``for` `l ``in` `lengths}` `# iterate over the strings and categorize them by length``for` `s ``in` `arr:``    ``res[``len``(s)].append(s)` `# convert result to list and sort each sublist``res ``=` `[``sorted``(res[l]) ``for` `l ``in` `lengths]` `# print result``print``(``"The list after Categorization : "` `+` `str``(res))``#This code is contributed by Rayudu.`

```Output:
The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['for', 'man'], ['free', 'geek']]```

The time complexity : O(n log n), where n is the length of the input list test_list. This is because we are using the sorted() function to sort each sublist, and its time complexity is O(m log m) where m is the length of the sublist. Since the maximum value of m is n, the total time complexity is O(n log n).

The auxiliary space : O(n), where n is the length of the input list test_list. This is because we are creating a numpy array and a dictionary with n elements each. The additional memory used by the other variables and function calls is negligible.

My Personal Notes arrow_drop_up