Python NLTK | nltk.WhitespaceTokenizer

With the help of nltk.tokenize.WhitespaceTokenizer() method, we are able to extract the tokens from string of words or sentences without whitespaces, new line and tabs by using tokenize.WhitespaceTokenizer() method.

Syntax : tokenize.WhitespaceTokenizer()
Return : Return the tokens from a string

Example #1 :
In this example we can see that by using tokenize.WhitespaceTokenizer() method, we are able to extract the tokens from stream of words.

# import WhitespaceTokenizer() method from nltk 

from nltk.tokenize import WhitespaceTokenizer 

# Create a reference variable for Class WhitespaceTokenizer 

tk = WhitespaceTokenizer() 

# Create a string input 

gfg = "GeeksforGeeks \nis\t for geeks"

# Use tokenize method 

geek = tk.tokenize(gfg) 

print(geek)

Output :

[‘GeeksforGeeks’, ‘is’, ‘for’, ‘geeks’]

Example #2 :

# import WhitespaceTokenizer() method from nltk 

from nltk.tokenize import WhitespaceTokenizer 

# Create a reference variable for Class WhitespaceTokenizer 

tk = WhitespaceTokenizer() 

# Create a string input 

gfg = "The price\t of burger \nin BurgerKing is Rs.36.\n"

# Use tokenize method 

geek = tk.tokenize(gfg) 

print(geek)

Output :

[‘The’, ‘price’, ‘of’, ‘burger’, ‘in’, ‘BurgerKing’, ‘is’, ‘Rs.36.’]

Article Tags :

Python

Python-nltk