Open In App

Python | Split strings and digits from string list

Last Updated : 01 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Sometimes, while working with String list, we can have a problem in which we need to remove the surrounding stray characters or noise from list of digits. This can be in form of Currency prefix, signs of numbers etc. Let’s discuss a way in which this task can be performed.

Method #1 : Using list comprehension + strip() + isdigit() + join() The combination of above functions can be used to perform this task. In this, we strip the stray characters from numbers that are identified from the strings and return the result. 

Python3




# Python3 code to demonstrate working of
# Extract digit from string list
# using list comprehension + strip() + isdigit() + join()
from itertools import groupby
 
# initialize list
test_list = ["-4", "Rs 25", "5 kg", "+15"]
 
# printing original list
print("The original list : " + str(test_list))
 
# Extract digit from string list
# using list comprehension + strip() + isdigit() + join()
res = [''.join(j).strip() for sub in test_list
        for k, j in groupby(sub, str.isdigit)]
 
# printing result
print("List after removing stray characters : " + str(res))


Output : 

The original list : ['-4', 'Rs 25', '5 kg', '+15']
List after removing stray characters : ['-', '4', 'Rs', '25', '5', 'kg', '+', '15']

Method #2: Using filter() and lambda function

Step-by-step Algorithm:

  1. Import the re module.
  2. Create an empty list split_list to store the split string values.
  3. Loop through each string str in the original_list.
  4. Use the findall() method from the re module to split the string into digits and non-digits.
  5. Extend the split_list with only non-empty and non-whitespace values from the split_str.
  6. Print the original_list and the split_list.

Python3




import re
 
original_list = ['-4', 'Rs 25', '5 kg', '+15']
 
split_list = []
for str in original_list:
    split_str = re.findall(r'\d+|\D+', str)
    split_list.extend([s for s in split_str if s.strip()])
 
print("The original list :", original_list)
print("List after removing stray characters :", split_list)


Output

The original list : ['-4', 'Rs 25', '5 kg', '+15']
List after removing stray characters : ['-', '4', 'Rs ', '25', '5', ' kg', '+', '15']

Complexity Analysis :

Time complexity: O(nm), where n is the length of the original_list and m is the average length of the strings in the list.
Auxiliary Space: O(nm), where n is the length of the original_list and m is the average length of the strings in the list.

Method #3: Using a loop and a try-except block

  1. Initialize an empty list called clean_list.
  2. Loop through each string in original_list.
  3. Inside the loop, try to convert the string to an integer using the int() function.
  4. If the conversion succeeds (i.e., the string contains only digits), append the integer to clean_list.
  5. If the conversion fails (i.e., the string contains non-digit characters), use regular expressions to split the string into digit and non-digit substrings.
  6. Filter out any empty substrings and append the non-digit substrings to clean_list.
  7. After the loop, print the original list and the cleaned list.

Python3




import re
 
original_list = ['-4', 'Rs 25', '5 kg', '+15']
 
clean_list = []
for s in original_list:
    try:
        i = int(s)
        clean_list.append(i)
    except ValueError:
        substrings = re.findall(r'\d+|\D+', s)
        clean_substrings = [ss for ss in substrings if ss.strip()]
        clean_list.extend(clean_substrings)
 
print("The original list:", original_list)
print("The cleaned list:", clean_list)


Output

The original list: ['-4', 'Rs 25', '5 kg', '+15']
The cleaned list: [-4, 'Rs ', '25', '5', ' kg', 15]

Time complexity: The loop runs once for each string in original_list, so the time complexity is O(n), where n is the length of original_list.

Auxiliary space: The clean_list list requires O(n) space to store the cleaned strings. 

Method #4: Using regular expressions-

  • Importing the re module for using regular expressions.
  • Initializing a list test_list with some strings that include numbers and other characters.
  • Printing the original list using print() function and string concatenation.
  • Initializing an empty list res to store the cleaned values.
  • Using a for loop to iterate over each string in the test_list.
  • Using the re.findall() method with a regular expression pattern as an argument to extract all the numbers and non-numeric characters from the current string. The regular expression pattern r’-?\d+(?:\.\d+)?|\w+’ matches either a number (integer or decimal) with an optional minus sign, or a sequence of word characters.
  • Appending the extracted values to the res list using the += operator.
  • Using another list comprehension to convert each extracted value to an integer if it’s a number, or leave it as a string if it’s a non-numeric character.

Python3




import re
 
# initialize list
test_list = ["-4", "Rs 25", "5 kg", "+15"]
 
# printing original list
print("The original list: " + str(test_list))
 
# Extract digit from string list
# using regular expression and loop
res = []
for s in test_list:
    res += re.findall(r'-?\d+(?:\.\d+)?|\w+', s)
res = [int(num) if num.isdigit() or (num.startswith('-') and num[1:].isdigit()) else num for num in res]
 
# printing result
print("The cleaned list: " + str(res))


Output

The original list: ['-4', 'Rs 25', '5 kg', '+15']
The cleaned list: [-4, 'Rs', 25, 5, 'kg', 15]

The time complexity of this approach is O(n*m), where n is the length of the input list and m is the maximum length of a string in the list.

The space complexity of this approach is O(n*m), as we are creating a new list with the extracted elements. 



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads