NLP | Combining NGram Taggers
Last Updated :
16 Dec, 2019
NgramTagger has 3 subclasses
- UnigramTagger
- BigramTagger
- TrigramTagger
BigramTagger subclass uses previous tag as part of its context
TrigramTagger subclass uses the previous two tags as part of its context.
ngram – It is a subsequence of n items.
Idea of NgramTagger subclasses :
- By looking at the previous words and P-O-S tags, part-of-speech tag for the current word can be guessed.
- Each tagger maintains a context dictionary (ContextTagger parent class is used to implement it).
- This dictionary is used to guess that tag based on the context.
- The context is some number of previous tagged words in the case of NgramTagger subclasses.
Code #1 : Working of Bigram tagger
from nltk.tag import DefaultTagger
from nltk.tag import BigramTagger
from nltk.corpus import treebank
train_data = treebank.tagged_sents()[: 3000 ]
test_data = treebank.tagged_sents()[ 3000 :]
tag1 = BigramTagger(train_data)
tag1.evaluate(test_data)
|
Output :
0.11318799913662854
Code #2 : Working of Trigram tagger
from nltk.tag import DefaultTagger
from nltk.tag import TrigramTagger
from nltk.corpus import treebank
train_data = treebank.tagged_sents()[: 3000 ]
test_data = treebank.tagged_sents()[ 3000 :]
tag1 = TrigramTagger(train_data)
tag1.evaluate(test_data)
|
Output :
0.06876753723289446
Code #3 : Collectively using Unigram, Bigram and Trigram tagger.
from nltk.tag import TrigramTagger
from tag_util import backoff_tagger
from nltk.corpus import treebank
train_data = treebank.tagged_sents()[: 3000 ]
test_data = treebank.tagged_sents()[ 3000 :]
backoff = DefaultTagger( 'NN' )
tag = backoff_tagger(train_sents,
[UnigramTagger, BigramTagger, TrigramTagger],
backoff = backoff)
tag.evaluate(test_sents)
|
Output :
0.8806820634578028
How it works ?
- The backoff_tagger function creates an instance of each tagger class.
- It gives previous tagger and train_sents as a backoff.
- The order of tagger classes is important: In the code above the first class is UnigramTagger and hence, it will be trained first and given the initial backoff tagger (the DefaultTagger).
- This tagger then becomes the backoff tagger for the next tagger class.
- Final tagger returned will be an instance of the last tagger class – TrigramTagger.
Code #4 : Proof
print (tagger._taggers[ - 1 ] = = backoff)
print ( "\n" , isinstance (tagger._taggers[ 0 ], TrigramTagger))
print ( "\n" , isinstance (tagger._taggers[ 1 ], BigramTagger))
|
Output :
True
True
True
Share your thoughts in the comments
Please Login to comment...