NLP | Combining NGram Taggers

NgramTagger has 3 subclasses

  • UnigramTagger
  • BigramTagger
  • TrigramTagger

BigramTagger subclass uses previous tag as part of its context
TrigramTagger subclass uses the previous two tags as part of its context.

ngram – It is a subsequence of n items.
Idea of NgramTagger subclasses :

  • By looking at the previous words and P-O-S tags, part-of-speech tag for the current word can be guessed.
  • Each tagger maintains a context dictionary (ContextTagger parent class is used to implement it).
  • This dictionary is used to guess that tag based on the context.
  • The context is some number of previous tagged words in the case of NgramTagger subclasses.

Code #1 : Working of Bigram tagger

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries 
from nltk.tag import DefaultTagger 
from nltk.tag import BigramTagger
  
from nltk.corpus import treebank
  
# intializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# Tagging
tag1 = BigramTagger(train_data)
  
# Evaluation
tag1.evaluate(test_data)

chevron_right


Output :

0.11318799913662854

 
Code #2 : Working of Trigram tagger

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries 
from nltk.tag import DefaultTagger 
from nltk.tag import TrigramTagger
  
from nltk.corpus import treebank
  
# intializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# Tagging
tag1 = TrigramTagger(train_data)
  
# Evaluation
tag1.evaluate(test_data)

chevron_right


Output :

0.06876753723289446

 
Code #3 : Collectively using Unigram, Bigram and Trigram tagger.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries
   
from nltk.tag import TrigramTagger
from tag_util import backoff_tagger
from nltk.corpus import treebank
  
# intializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
backoff = DefaultTagger('NN')
tag = backoff_tagger(train_sents, 
                     [UnigramTagger, BigramTagger, TrigramTagger], 
                     backoff = backoff)
  
tag.evaluate(test_sents)

chevron_right


Output :

0.8806820634578028

How it works ?

  • The backoff_tagger function creates an instance of each tagger class.
  • It gives previous tagger and train_sents as a backoff.
  • The order of tagger classes is important: In the code above the first class is UnigramTagger and hence, it will be trained first and given the initial backoff tagger (the DefaultTagger).
  • This tagger then becomes the backoff tagger for the next tagger class.
  • Final tagger returned will be an instance of the last tagger class – TrigramTagger.

Code #4 : Proof

filter_none

edit
close

play_arrow

link
brightness_4
code

print (tagger._taggers[-1] == backoff)
  
print ("\n", isinstance(tagger._taggers[0], TrigramTagger))
  
print ("\n", isinstance(tagger._taggers[1], BigramTagger))

chevron_right


Output :

True

True

True


My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.