nltk.probability.FreqDist is used to find the most common words by counting word frequencies in the treebank corpus. ConditionalFreqDist class is created for tagged words, where we count the frequency of every tag for every word. These counts are then used too construct a model of the frequent words as keys, with the most frequent tag for each word as a value.
Code #1 : Creating function
Code #2 : Using the function with UnigramTagger
Accuracy : 0.559680552557738
Code #3 : Let’s try backoff chain
Accuracy : 0.8806820634578028
Note : Backoff chain has increases the accuracy. We can improve this resukt further by effectively using UnigramTagger class.
Code #4 : Manual Override of Trained Taggers
Accuracy : 0.8824088063889488
- NLP | IOB tags
- NLP | Location Tags Extraction
- NLP | Trigrams'n'Tags (TnT) Tagging
- NLP | Word Collocations
- NLP | Synsets for a word in WordNet
- Python | Reverse each word in a sentence
- Generating Word Cloud in Python
- Generating Word Cloud in Python | Set 2
- Python | Word Embedding using Word2Vec
- Word Prediction using concepts of N - grams and CDF
- Count occurrences of a word in string
- Second most repeated word in a sequence in Python
- Python | Word Similarity using spaCy
- NLP | Part of speech tagged - word corpus
- Python program for word guessing game
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.