NLP | Combining NGram Taggers

NgramTagger has 3 subclasses

  • UnigramTagger
  • BigramTagger
  • TrigramTagger

BigramTagger subclass uses previous tag as part of its context
TrigramTagger subclass uses the previous two tags as part of its context.

ngram – It is a subsequence of n items.
Idea of NgramTagger subclasses :

  • By looking at the previous words and P-O-S tags, part-of-speech tag for the current word can be guessed.
  • Each tagger maintains a context dictionary (ContextTagger parent class is used to implement it).
  • This dictionary is used to guess that tag based on the context.
  • The context is some number of previous tagged words in the case of NgramTagger subclasses.

Code #1 : Working of Bigram tagger

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries 
from nltk.tag import DefaultTagger 
from nltk.tag import BigramTagger
  
from nltk.corpus import treebank
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# Tagging
tag1 = BigramTagger(train_data)
  
# Evaluation
tag1.evaluate(test_data)

chevron_right


Output :



0.11318799913662854

 
Code #2 : Working of Trigram tagger

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries 
from nltk.tag import DefaultTagger 
from nltk.tag import TrigramTagger
  
from nltk.corpus import treebank
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# Tagging
tag1 = TrigramTagger(train_data)
  
# Evaluation
tag1.evaluate(test_data)

chevron_right


Output :

0.06876753723289446

 
Code #3 : Collectively using Unigram, Bigram and Trigram tagger.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries
   
from nltk.tag import TrigramTagger
from tag_util import backoff_tagger
from nltk.corpus import treebank
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
backoff = DefaultTagger('NN')
tag = backoff_tagger(train_sents, 
                     [UnigramTagger, BigramTagger, TrigramTagger], 
                     backoff = backoff)
  
tag.evaluate(test_sents)

chevron_right


Output :

0.8806820634578028

How it works ?

  • The backoff_tagger function creates an instance of each tagger class.
  • It gives previous tagger and train_sents as a backoff.
  • The order of tagger classes is important: In the code above the first class is UnigramTagger and hence, it will be trained first and given the initial backoff tagger (the DefaultTagger).
  • This tagger then becomes the backoff tagger for the next tagger class.
  • Final tagger returned will be an instance of the last tagger class – TrigramTagger.

Code #4 : Proof

filter_none

edit
close

play_arrow

link
brightness_4
code

print (tagger._taggers[-1] == backoff)
  
print ("\n", isinstance(tagger._taggers[0], TrigramTagger))
  
print ("\n", isinstance(tagger._taggers[1], BigramTagger))

chevron_right


Output :

True

True

True



My Personal Notes arrow_drop_up


If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.



Improved By : shubham_singh