NLP | Classifier-based tagging

Last Updated : 16 Dec, 2019

ClassifierBasedPOSTagger class:

It is a subclass of ClassifierBasedTagger that uses classification technique to do part-of-speech tagging.
From the words, features are extracted and then passed to an internal classifier.
It classifies the features and returns a label i.e. a part-of-speech tag.
The feature detector finds multiple length suffixes, does some regular expression matching, and looks at the unigram, bigram, and trigram history to produce a fairly complete set of features for each word

Code #1 : Using ClassifierBasedPOSTagger

from nltk.tag.sequential import ClassifierBasedPOSTagger 
from nltk.corpus import treebank 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
tagging = ClassifierBasedPOSTagger(train = train_data) 
  
a = tagging.evaluate(test_data) 
  
print ("Accuracy : ", a) 

Output :

Accuracy : 0.9309734513274336

ClassifierBasedPOSTagger class inherits from ClassifierBasedTagger and only implements a feature_detector() method. All the training and tagging is done in ClassifierBasedTagger.

Code #2 : Using MaxentClassifier

from nltk.classify import MaxentClassifier 
from nltk.corpus import treebank 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
  
tagger = ClassifierBasedPOSTagger( 
        train = train_sents, classifier_builder = MaxentClassifier.train) 
  
a = tagger.evaluate(test_data) 
  
print ("Accuracy : ", a) 

Output :

Accuracy : 0.9258363911072739

custom feature detector detecting features
There are two ways to do it:

Subclass ClassifierBasedTagger and implement a feature_detector() method.
Pass a function as the feature_detector keyword argument into ClassifierBasedTagger at initialization.

Code #3 : Custom Feature Detector

from nltk.tag.sequential import ClassifierBasedTagger 
from tag_util import unigram_feature_detector 
from nltk.corpus import treebank 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
tag = ClassifierBasedTagger( 
        train = train_data,  
        feature_detector = unigram_feature_detector) 
  
a = tagger.evaluate(test_data) 
  
print ("Accuracy : ", a)