nltk.probability.FreqDist is used to find the most common words by counting word frequencies in the treebank corpus. ConditionalFreqDist class is created for tagged words, where we count the frequency of every tag for every word. These counts are then used too construct a model of the frequent words as keys, with the most frequent tag for each word as a value.
Code #1 : Creating function
Code #2 : Using the function with UnigramTagger
Accuracy : 0.559680552557738
Code #3 : Let’s try backoff chain
Accuracy : 0.8806820634578028
Note : Backoff chain has increases the accuracy. We can improve this resukt further by effectively using UnigramTagger class.
Code #4 : Manual Override of Trained Taggers
Accuracy : 0.8824088063889488
- NLP | IOB tags
- NLP | Trigrams'n'Tags (TnT) Tagging
- NLP | Location Tags Extraction
- NLP | Synsets for a word in WordNet
- NLP | Word Collocations
- NLP | Part of speech tagged - word corpus
- Python program to read file word by word
- now - Django Template Tags
- comment - Django template tags
- cycle - Django Template Tags
- extends - Django Template Tags
- if - Django Template Tags
- for ... empty loop - Django Template Tags
- for loop - Django Template Tags
- Boolean Operators - Django Template Tags
- firstof - Django Template Tags
- include - Django Template Tags
- lorem - Django Template Tags
- Django Template Tags
- NLP | Classifier-based Chunking | Set 2
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : shubham_singh