NLP | Combining NGram Taggers

Last Updated : 16 Dec, 2019

NgramTagger has 3 subclasses

UnigramTagger
BigramTagger
TrigramTagger

BigramTagger subclass uses previous tag as part of its context
TrigramTagger subclass uses the previous two tags as part of its context.

ngram – It is a subsequence of n items.
Idea of NgramTagger subclasses :

By looking at the previous words and P-O-S tags, part-of-speech tag for the current word can be guessed.
Each tagger maintains a context dictionary (ContextTagger parent class is used to implement it).
This dictionary is used to guess that tag based on the context.
The context is some number of previous tagged words in the case of NgramTagger subclasses.

Code #1 : Working of Bigram tagger

# Loading Libraries  
from nltk.tag import DefaultTagger  
from nltk.tag import BigramTagger 
  
from nltk.corpus import treebank 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
# Tagging 
tag1 = BigramTagger(train_data) 
  
# Evaluation 
tag1.evaluate(test_data) 

Output :

0.11318799913662854

Code #2 : Working of Trigram tagger

# Loading Libraries  
from nltk.tag import DefaultTagger  
from nltk.tag import TrigramTagger 
  
from nltk.corpus import treebank 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
# Tagging 
tag1 = TrigramTagger(train_data) 
  
# Evaluation 
tag1.evaluate(test_data) 

Output :

0.06876753723289446

Code #3 : Collectively using Unigram, Bigram and Trigram tagger.

# Loading Libraries 
   
from nltk.tag import TrigramTagger 
from tag_util import backoff_tagger 
from nltk.corpus import treebank 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
backoff = DefaultTagger('NN') 
tag = backoff_tagger(train_sents,  
                     [UnigramTagger, BigramTagger, TrigramTagger],  
                     backoff = backoff) 
  
tag.evaluate(test_sents) 

Output :

0.8806820634578028

How it works ?

The backoff_tagger function creates an instance of each tagger class.
It gives previous tagger and train_sents as a backoff.
The order of tagger classes is important: In the code above the first class is UnigramTagger and hence, it will be trained first and given the initial backoff tagger (the DefaultTagger).
This tagger then becomes the backoff tagger for the next tagger class.
Final tagger returned will be an instance of the last tagger class – TrigramTagger.

Code #4 : Proof

print (tagger._taggers[-1] == backoff) 
  
print ("\n", isinstance(tagger._taggers[0], TrigramTagger)) 
  
print ("\n", isinstance(tagger._taggers[1], BigramTagger)) 

Output :

True

True

True

Suggest improvement

NLP | Brill Tagger

Share your thoughts in the comments

NLP | Combining NGram Taggers

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?