Python – Split strings ignoring the space formatting characters
Last Updated :
23 Apr, 2023
Given a String, Split into words ignoring space formatting characters like \n, \t, etc.
Input : test_str = ‘geeksforgeeks\n\r\\nt\t\n\t\tbest\r\tfor\f\vgeeks’
Output : [‘geeksforgeeks’, ‘best’, ‘for’, ‘geeks’]
Explanation : All space characters are used as parameter to join.
Input : test_str = ‘geeksforgeeks\n\r\\nt\t\n\t\tbest’
Output : [‘geeksforgeeks’, ‘best’]
Explanation : All space characters are used as parameter to join.
Method 1: Using re.split()
In this, we employ appropriate regex composed of space characters and use split() to perform split on set of regex characters.
Python3
import re
test_str = 'geeksforgeeks\n\r\t\t\nis\t\tbest\r\tfor geeks'
print ( "The original string is : " + str (test_str))
res = re.split(r '[\n\t\f\v\r ]+' , test_str)
print ( "The split string : " + str (res))
|
Output:
The original string is : geeksforgeeks
is best
for geeks
The split string : ['geeksforgeeks', 'is', 'best', 'for', 'geeks']
Time Complexity: O(n)
Auxiliary Space: O(n)
Method 2: Using split()
The split() function by-default splits the string on white-spaces.
Python3
test_str = 'geeksforgeeks\n\r\t\t\nis\t\tbest\r\tfor geeks'
print ( "The original string is : " + str (test_str))
print ( "The split string : " + str (test_str.split()))
|
Output:
The original string is : geeksforgeeks
is best
for geeks
The split string : ['geeksforgeeks', 'is', 'best', 'for', 'geeks']
Time Complexity: O(n)
Auxiliary Space: O(n)
Approach#3: Using string.split() method with filter()
- Use the string.split() method to split the input string into substrings.
- Use the filter() function to remove any empty strings from the resulting list of substrings.
- Return the filtered list of substrings.
Python3
def split_string(test_str):
substrings = test_str.split()
substrings = list ( filter ( lambda s: s.strip(), substrings))
return substrings
test_str = 'geeksforgeeks\n\r\t\t\nis\t\tbest\r\tfor geeks'
print (split_string(test_str))
|
Output
['geeksforgeeks', 'is', 'best', 'for', 'geeks']
Time Complexity: O(n), where n is the length of the input string. The split() method takes linear time in the length of the string.
Space Complexity: O(n), where n is the length of the input string. The space used by the resulting list of substrings is proportional to the length of the input string.
Approach#4
Method 4 : use the itertools module to group contiguous non-space characters together and then join them into separate substrings.
Steps :
Import the itertools module to work with iterators and grouping functions.
Use the itertools.groupby() function to group contiguous non-space characters in the input string.
Use a list comprehension to join the characters in each group into separate substrings.
Print the resulting list of substrings
Python3
import itertools
test_str = 'geeksforgeeks\n\r\t\t\nis\t\tbest\r\tfor geeks'
result = [''.join(group) for is_space, group in itertools.groupby(test_str, lambda x: x.isspace()) if not is_space]
print ( "The split string : " + str (result))
|
Output
The split string : ['geeksforgeeks', 'is', 'best', 'for', 'geeks']
Time complexity: The itertools.groupby() function has a linear time complexity in the length of the input string, so this approach has a time complexity of O(n), where n is the length of the input string.
Auxiliary space: This approach creates a list to store the resulting substrings, so it has an auxiliary space complexity of O(n), where n is the length of the input string.
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...