Open In App

Python | Split strings and digits from string list

Sometimes, while working with String list, we can have a problem in which we need to remove the surrounding stray characters or noise from list of digits. This can be in form of Currency prefix, signs of numbers etc. Let’s discuss a way in which this task can be performed.

Method #1 : Using list comprehension + strip() + isdigit() + join() The combination of above functions can be used to perform this task. In this, we strip the stray characters from numbers that are identified from the strings and return the result. 






# Python3 code to demonstrate working of
# Extract digit from string list
# using list comprehension + strip() + isdigit() + join()
from itertools import groupby
 
# initialize list
test_list = ["-4", "Rs 25", "5 kg", "+15"]
 
# printing original list
print("The original list : " + str(test_list))
 
# Extract digit from string list
# using list comprehension + strip() + isdigit() + join()
res = [''.join(j).strip() for sub in test_list
        for k, j in groupby(sub, str.isdigit)]
 
# printing result
print("List after removing stray characters : " + str(res))

Output : 
The original list : ['-4', 'Rs 25', '5 kg', '+15']
List after removing stray characters : ['-', '4', 'Rs', '25', '5', 'kg', '+', '15']

Method #2: Using filter() and lambda function



Step-by-step Algorithm:

  1. Import the re module.
  2. Create an empty list split_list to store the split string values.
  3. Loop through each string str in the original_list.
  4. Use the findall() method from the re module to split the string into digits and non-digits.
  5. Extend the split_list with only non-empty and non-whitespace values from the split_str.
  6. Print the original_list and the split_list.




import re
 
original_list = ['-4', 'Rs 25', '5 kg', '+15']
 
split_list = []
for str in original_list:
    split_str = re.findall(r'\d+|\D+', str)
    split_list.extend([s for s in split_str if s.strip()])
 
print("The original list :", original_list)
print("List after removing stray characters :", split_list)

Output
The original list : ['-4', 'Rs 25', '5 kg', '+15']
List after removing stray characters : ['-', '4', 'Rs ', '25', '5', ' kg', '+', '15']

Complexity Analysis :

Time complexity: O(nm), where n is the length of the original_list and m is the average length of the strings in the list.
Auxiliary Space: O(nm), where n is the length of the original_list and m is the average length of the strings in the list.

Method #3: Using a loop and a try-except block

  1. Initialize an empty list called clean_list.
  2. Loop through each string in original_list.
  3. Inside the loop, try to convert the string to an integer using the int() function.
  4. If the conversion succeeds (i.e., the string contains only digits), append the integer to clean_list.
  5. If the conversion fails (i.e., the string contains non-digit characters), use regular expressions to split the string into digit and non-digit substrings.
  6. Filter out any empty substrings and append the non-digit substrings to clean_list.
  7. After the loop, print the original list and the cleaned list.




import re
 
original_list = ['-4', 'Rs 25', '5 kg', '+15']
 
clean_list = []
for s in original_list:
    try:
        i = int(s)
        clean_list.append(i)
    except ValueError:
        substrings = re.findall(r'\d+|\D+', s)
        clean_substrings = [ss for ss in substrings if ss.strip()]
        clean_list.extend(clean_substrings)
 
print("The original list:", original_list)
print("The cleaned list:", clean_list)

Output
The original list: ['-4', 'Rs 25', '5 kg', '+15']
The cleaned list: [-4, 'Rs ', '25', '5', ' kg', 15]

Time complexity: The loop runs once for each string in original_list, so the time complexity is O(n), where n is the length of original_list.

Auxiliary space: The clean_list list requires O(n) space to store the cleaned strings. 

Method #4: Using regular expressions-




import re
 
# initialize list
test_list = ["-4", "Rs 25", "5 kg", "+15"]
 
# printing original list
print("The original list: " + str(test_list))
 
# Extract digit from string list
# using regular expression and loop
res = []
for s in test_list:
    res += re.findall(r'-?\d+(?:\.\d+)?|\w+', s)
res = [int(num) if num.isdigit() or (num.startswith('-') and num[1:].isdigit()) else num for num in res]
 
# printing result
print("The cleaned list: " + str(res))

Output
The original list: ['-4', 'Rs 25', '5 kg', '+15']
The cleaned list: [-4, 'Rs', 25, 5, 'kg', 15]

The time complexity of this approach is O(n*m), where n is the length of the input list and m is the maximum length of a string in the list.

The space complexity of this approach is O(n*m), as we are creating a new list with the extracted elements. 


Article Tags :