With the help of
NLTK nltk.tokenize.mwe() method, we can tokenize the audio stream into multi_word expression token which helps to bind the tokens with underscore by using
nltk.tokenize.mwe() method. Remember it is case sensitive.
Return : Return bind tokens as one if declared before.
Example #1 :
In this example we are using
MWETokenizer.tokenize() method, which used to bind the tokens which is defined before. We can also add the predefined tokens by using
Example #2 :
[‘who_are_you’, ‘at’, ‘geeks_for_geeks’]
- Python NLTK | nltk.tokenize.TabTokenizer()
- Python | NLTK nltk.tokenize.ConditionalFreqDist()
- Python NLTK | nltk.tokenize.LineTokenizer
- Python NLTK | nltk.tokenize.StanfordTokenizer()
- Python NLTK | nltk.tokenize.SExprTokenizer()
- Python NLTK | nltk.tokenize.SpaceTokenizer()
- Python NLTK | nltk.tokenizer.word_tokenize()
- Python NLTK | nltk.WhitespaceTokenizer
- Python NLTK | nltk.TweetTokenizer()
- Python | Lemmatization with NLTK
- Python NLTK | tokenize.regexp()
- Python NLTK | tokenize.WordPunctTokenizer()
- Python | Stemming words with NLTK
- Tokenize text using NLTK in python
- Python | Gender Identification by name using NLTK
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.