With the help of NLTK tokenize.regexp()
module, we are able to extract the tokens from string by using regular expression with RegexpTokenizer()
method.
Syntax : tokenize.RegexpTokenizer()
Return : Return array of tokens using regular expression
Example #1 :
In this example we are using RegexpTokenizer()
method to extract the stream of tokens with the help of regular expressions.
from nltk.tokenize import RegexpTokenizer
tk = RegexpTokenizer( '\s+' , gaps = True )
gfg = "I love Python"
geek = tk.tokenize(gfg)
print (geek)
|
Output :
[‘I’, ‘love’, ‘Python’]
Example #2 :
from nltk.tokenize import RegexpTokenizer
tk = RegexpTokenizer( '\s+' , gaps = True )
gfg = "Geeks for Geeks"
geek = tk.tokenize(gfg)
print (geek)
|
Output :
[‘Geeks’, ‘for’, ‘Geeks’]
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
07 Jun, 2019
Like Article
Save Article