Skip to content
Related Articles

Related Articles

Improve Article

NLP | Likely Word Tags

  • Last Updated : 19 Dec, 2019

nltk.probability.FreqDist is used to find the most common words by counting word frequencies in the treebank corpus. ConditionalFreqDist class is created for tagged words, where we count the frequency of every tag for every word. These counts are then used too construct a model of the frequent words as keys, with the most frequent tag for each word as a value.

Code #1 : Creating function




# Loading Libraries
from nltk.probability import FreqDist, ConditionalFreqDist
  
# Making function
def word_tag_model(words, tagged_words, limit = 200):
      
    fd = FreqDist(words)
    cfd = ConditionalFreqDist(tagged_words)
    most_freq = (word for word, count in fd.most_common(limit))
      
return dict((word, cfd[word].max()) 
             for word in most_freq)

 
Code #2 : Using the function with UnigramTagger




# loading libraries
from tag_util import word_tag_model
from nltk.corpus import treebank
from nltk.tag import UnigramTagger
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# Initializing the model
model = word_tag_model(treebank.words(), 
                       treebank.tagged_words())
  
# Initializing the Unigram
tag = UnigramTagger(model = model)
  
print ("Accuracy : ", tag.evaluate(test_data))

Output :

Accuracy : 0.559680552557738

 
Code #3 : Let’s try backoff chain






# Loading libraries
from nltk.tag import UnigramTagger
from nltk.tag import DefaultTagger
  
default_tagger = DefaultTagger('NN')
  
likely_tagger = UnigramTagger(
        model = model, backoff = default_tagger)
  
tag = backoff_tagger(train_sents, [
        UnigramTagger, BigramTagger, 
        TrigramTagger], backoff = likely_tagger)
      
print ("Accuracy : ", tag.evaluate(test_data))

Output :

Accuracy : 0.8806820634578028

Note : Backoff chain has increases the accuracy. We can improve this resukt further by effectively using UnigramTagger class.
 
Code #4 : Manual Override of Trained Taggers




# Loading libraries
from nltk.tag import UnigramTagger
from nltk.tag import DefaultTagger
  
default_tagger = DefaultTagger('NN')
  
tagger = backoff_tagger(train_sents, [
        UnigramTagger, BigramTagger,
        TrigramTagger], backoff = default_tagger)
      
likely_tag = UnigramTagger(model = model, backoff = tagger)
  
print ("Accuracy : ", likely_tag.evaluate(test_data))

Output :

Accuracy : 0.8824088063889488

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :