Python | Categorize the given list by string size

Last Updated : 02 May, 2023

Sometimes, we have a use case in which we need to perform the grouping of strings by various factors, like first letter or any other factor. These type of problems are typical to database queries and hence can occur in web development while programming. This article focuses on one such grouping by size of string. Let’s discuss certain ways in which this can be performed.

Method #1 : Using next() + lambda + loop The combination of above 3 functions is used to solve this particular problem by the naive method. The lambda function performs the task of finding like lengths, and next function helps in forward iteration.

Python3

# Python3 code to demonstrate
# Categorize by string size
# using next() + lambda + loop
 
# initializing list
test_list = ['man', 'a', 'geek', 'for', 'b', 'free']
 
# printing original list
print("The original list : " + str(test_list))
 
# using next() + lambda + loop
# Categorize by string size
 
 
def util_func(x, y): return len(x) == len(y)
 
 
res = []
for sub in test_list:
    ele = next((x for x in res if util_func(sub, x[0])), [])
    if ele == []:
        res.append(ele)
    ele.append(sub)
 
# print result
print("The list after Categorization : " + str(res))

Output :

The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['man', 'for'], ['a', 'b'], ['geek', 'free']]

Time Complexity: O(n), where n is the number of elements in the list “test_list”.
Auxiliary Space: O(n), where n is the number of elements in the list “test_list”.

Method #2: Using sorted() + groupby()

This particular task can also be solved using the groupby function which offers a convenient method to solve this problem. The sorted function sorts the elements by size to be feed to groupby for the relevant grouping.

Python3

# Python3 code to demonstrate
# Categorize by string size
# using sorted() + groupby()
 
from itertools import groupby
 
# initializing list
test_list = ['man', 'a', 'geek', 'for', 'b', 'free']
 
# printing original list
print("The original list : " + str(test_list))
 
# using sorted() + groupby()
# Categorize by string size
 
 
def util_func(x): return len(x)
 
 
temp = sorted(test_list, key=util_func)
res = [list(ele) for i, ele in groupby(temp, util_func)]
 
# print result
print("The list after Categorization : " + str(res))

Output :

The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['man', 'for'], ['geek', 'free']]

Time Complexity: O(nlogn) where n is the number of elements in the list “test_list”. sorted() + groupby() performs nlogn number of operations.
Auxiliary Space: O(n), extra space is required where n is the number of elements in the list

Method #3 : Using for loops, sort() and len() methods

Python3

# Python3 code to demonstrate
# Categorize by string size
 
# initializing list
test_list = ['man', 'a', 'geek', 'for', 'b', 'free']
 
# printing original list
print("The original list : " + str(test_list))
 
x=[]
for i in test_list:
    if len(i) not in x:
        x.append(len(i))
x.sort()
res=[]
for i in x:
    b=[]
    for j in test_list:
        if(len(j)==i):
            b.append(j)
    res.append(b)
 
# print result
print("The list after Categorization : " + str(res))

Output

The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['man', 'for'], ['geek', 'free']]

Method #4 : Using collections.defaultdict

The collections.defaultdict is a subclass of the built-in Python dict class that allows you to specify a default value for a dictionary key that does not exist. This can be useful when you want to group items in a list by some criteria and you want to create a new group for each unique criteria.

For example, to group the strings in a list by their size using a defaultdict, you could do the following:

Python3

from collections import defaultdict
 
# Initialize the list of strings
string_list = ['man', 'a', 'geek', 'for', 'b', 'free']
 
# Print the original list
print("The original list:", string_list)
 
# Initialize a defaultdict with a default value of an empty list
groups = defaultdict(list)
 
# Iterate through the list of strings
for s in string_list:
    # Use the length of the string as the key and append the string to the list
    # of strings with the same length
    groups[len(s)].append(s)
 
# Convert the defaultdict to a regular dictionary and get the values (lists of strings)
result = list(groups.values())
 
# Print the result
print("The list after categorization:", result)
#This code is contributed by Edula Vinay Kumar Reddy

Output

The original list: ['man', 'a', 'geek', 'for', 'b', 'free']
The list after categorization: [['man', 'for'], ['a', 'b'], ['geek', 'free']]

Time complexity: O(n)

Auxiliary Space: O(n)

Method #5 : Using numpy:

Initialize the input list test_list.
Convert the list to a numpy array using numpy.array() function.
Use numpy.unique() function to get the unique string lengths in the array.
Create an empty dictionary res to store the categorized strings. The keys of the dictionary are the unique
string lengths, and the values are empty lists.
Iterate over the strings in the numpy array using a for loop.
For each string, append it to the corresponding list in the res dictionary based on its length.
Convert the res dictionary to a list and sort each sublist in ascending order using sorted() function.
Print the categorized list.

Python3

import numpy as np
 
# initializing list
test_list = ['man', 'a', 'geek', 'for', 'b', 'free']
 
# printing original list
print("The original list : " + str(test_list))
 
# convert list to numpy array
arr = np.array(test_list)
 
# get unique string lengths
lengths = np.unique([len(s) for s in arr])
 
# create empty dictionary to store categorized strings
res = {l: [] for l in lengths}
 
# iterate over the strings and categorize them by length
for s in arr:
    res[len(s)].append(s)
 
# convert result to list and sort each sublist
res = [sorted(res[l]) for l in lengths]
 
# print result
print("The list after Categorization : " + str(res))
#This code is contributed by Rayudu.

Output:
The original list : ['man', 'a', 'geek', 'for', 'b', 'free']
The list after Categorization : [['a', 'b'], ['for', 'man'], ['free', 'geek']]

The time complexity : O(n log n), where n is the length of the input list test_list. This is because we are using the sorted() function to sort each sublist, and its time complexity is O(m log m) where m is the length of the sublist. Since the maximum value of m is n, the total time complexity is O(n log n).

The auxiliary space : O(n), where n is the length of the input list test_list. This is because we are creating a numpy array and a dictionary with n elements each. The additional memory used by the other variables and function calls is negligible.

Suggest improvement

Python - Strings with all given List characters

Share your thoughts in the comments