nltk.probability.FreqDist is used to find the most common words by counting word frequencies in the treebank corpus. ConditionalFreqDist class is created for tagged words, where we count the frequency of every tag for every word. These counts are then used too construct a model of the frequent words as keys, with the most frequent tag for each word as a value.
Code #1 : Creating function
Code #2 : Using the function with UnigramTagger
Accuracy : 0.559680552557738
Code #3 : Let’s try backoff chain
Accuracy : 0.8806820634578028
Note : Backoff chain has increases the accuracy. We can improve this resukt further by effectively using UnigramTagger class.
Code #4 : Manual Override of Trained Taggers
Accuracy : 0.8824088063889488
- NLP | IOB tags
- NLP | Trigrams'n'Tags (TnT) Tagging
- NLP | Location Tags Extraction
- NLP | Word Collocations
- Python | Word Stretch
- NLP | Synsets for a word in WordNet
- Python | Word Embedding using Word2Vec
- Generating Word Cloud in Python
- Word Prediction using concepts of N - grams and CDF
- Generating Word Cloud in Python | Set 2
- Python | Reverse each word in a sentence
- Python | Word Similarity using spaCy
- Second most repeated word in a sequence in Python
- Count occurrences of a word in string
- NLP | Part of speech tagged - word corpus
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.