Skip to content
Related Articles

Related Articles

Improve Article

Python NLTK | tokenize.regexp()

  • Last Updated : 07 Jun, 2019

With the help of NLTK tokenize.regexp() module, we are able to extract the tokens from string by using regular expression with RegexpTokenizer() method.

Syntax : tokenize.RegexpTokenizer()
Return : Return array of tokens using regular expression

Example #1 :
In this example we are using RegexpTokenizer() method to extract the stream of tokens with the help of regular expressions.




# import RegexpTokenizer() method from nltk
from nltk.tokenize import RegexpTokenizer
    
# Create a reference variable for Class RegexpTokenizer
tk = RegexpTokenizer('\s+', gaps = True)
    
# Create a string input
gfg = "I love Python"
    
# Use tokenize method
geek = tk.tokenize(gfg)
    
print(geek)

Output :

[‘I’, ‘love’, ‘Python’]



Example #2 :




# import RegexpTokenizer() method from nltk
from nltk.tokenize import RegexpTokenizer
    
# Create a reference variable for Class RegexpTokenizer
tk = RegexpTokenizer('\s+', gaps = True)
    
# Create a string input
gfg = "Geeks for Geeks"
    
# Use tokenize method
geek = tk.tokenize(gfg)
    
print(geek)

Output :

[‘Geeks’, ‘for’, ‘Geeks’]

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :