Skip to content
Related Articles

Related Articles

Save Article
Improve Article
Save Article
Like Article

Python | Split multiple characters from string

  • Last Updated : 11 Nov, 2019

While coding or improvising your programming skill, you surely must have come across many scenarios where you wished to use .split() in Python not to split on only one character but multiple characters at once. Consider this for an example:

"GeeksforGeeks, is an-awesome! website"

Using .split() on the above will result in

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

['GeeksforGeeks, ', 'is', 'an-awesome!', 'website']

whereas the desired result should be



['GeeksforGeeks', 'is', 'an', 'awesome', 'website']

In this article, we will look at some ways in which we can achieve the same.

Using re.split()

This is the most efficient and commonly used method to split on multiple characters at once. It makes use of regex(regular expressions) in order to this.




# Python3 code to demonstrate working of 
# Splitting operators in String 
# Using re.split() 
  
import re
  
# initializing string
data = "GeeksforGeeks, is_an-awesome ! website"
  
# printing original string  
print("The original string is : " + data) 
  
# Using re.split() 
# Splitting characters in String 
res = re.split(', |_|-|!', data)
  
# printing result  
print("The list after performing split functionality : " + str(res)) 

Output:

The original string is : GeeksforGeeks, is_an-awesome ! website
The list after performing split functionality : [‘GeeksforGeeks’, ‘is’, ‘an’, ‘awesome ‘, ‘ website’]

The line re.split(', |_|-|!', data) tells Python to split the variable data on the characters: , or _ or or !. The symbol “|” represents or.

There are some symbols in regex which are treated as special symbols and have different functions. If you wish to split on such a symbol, you need to escape it using a “\“(back-slash). List of special characters that needs to be escaped before using them:

. \ + * ? [ ^ ] $ ( ) { } = !  | : -

For example:




import re
newData = "GeeksforGeeks, is_an-awesome ! app + too"
  
# To split "+" use backslash
print(re.split(', |_|-|!|\+', newData))

Output:



['GeeksforGeeks', ' is', 'an', 'awesome', ' app', 'too']

Note: To know more about regex click here.

Using re.findall()

This is a bit more arcane form but saves time. It also makes use of regex like above but instead of .split() method, it uses a method called .findall(). This method finds all the matching instances and returns each of them in a list. This way of splitting is best used when you don’t know the exact characters you want to split upon.




# Python3 code to demonstrate working of 
# Splitting operators in String 
# Using re.findall() 
import re
  
  
# initializing string  
data = "This, is - another : example?!"
  
# printing original string  
print("The original string is : " + data) 
  
# Using re.findall() 
# Splitting characters in String 
res = re.findall(r"[\w']+", data)
  
# printing result  
print("The list after performing split functionality : " + str(res)) 

Output:

The original string is : This, is – another : example?!
The list after performing split functionality : [‘This’, ‘is’, ‘another’, ‘example’]

Here the keyword [\w']+ indicates that it will find all the instances of alphabets or underscore(_) one or more and return them in a list.
Note: [\w']+ won’t split upon an underscore(_) as it searches for alphabets as well as underscores.
For example:




import re
testData = "This, is - underscored _ example?!"
print(re.findall(r"[\w']+", testData))

Output:

['This', 'is', 'underscored', '_', 'example']

Using replace() and split()

This is a very rookie way of doing the split. It does not make use of regex and is inefficient but still worth a try. If you know the characters you want to split upon, just replace them with a space and then use .split():




# Python code to demonstrate  
# to split strings 
  
# Initial string
data = "Let's_try, this now"
  
# printing original string  
print("The original string is : " + data) 
  
# Using replace() and split() 
# Splitting characters in String  
res = data.replace('_', ' ').replace(', ', ' ').split()
  
# Printing result
print("The list after performing split functionality : " + str(res)) 

Output:

The original string is : Let’s_try, this now
The list after performing split functionality : [“Let’s”, ‘try’, ‘this’, ‘now’]

Character Classes

Regex cheat-sheet on character description

Shorthand character classRepresents
\dAny numeric digit from 0 to 9
\DAny character that is not a numeric digit from 0 to 9
\wAny letter, numeric digit, or the underscore character
\WAny character that is not a letter, numeric digit, or the underscore character
\sAny space, tab, or newline character
\SAny character that is not a space, tab, or newline



My Personal Notes arrow_drop_up
Recommended Articles
Page :