Skip to content
Related Articles

Related Articles

NLP | Classifier-based tagging
  • Last Updated : 16 Dec, 2019

ClassifierBasedPOSTagger class:

  • It is a subclass of ClassifierBasedTagger that uses classification technique to do part-of-speech tagging.
  • From the words, features are extracted and then passed to an internal classifier.
  • It classifies the features and returns a label i.e. a part-of-speech tag.
  • The feature detector finds multiple length suffixes, does some regular expression matching, and looks at the unigram, bigram, and trigram history to produce a fairly complete set of features for each word

Code #1 : Using ClassifierBasedPOSTagger




from nltk.tag.sequential import ClassifierBasedPOSTagger
from nltk.corpus import treebank
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
tagging = ClassifierBasedPOSTagger(train = train_data)
  
a = tagging.evaluate(test_data)
  
print ("Accuracy : ", a)

Output :

Accuracy : 0.9309734513274336

ClassifierBasedPOSTagger class inherits from ClassifierBasedTagger and only implements a feature_detector() method. All the training and tagging is done in ClassifierBasedTagger.

Code #2 : Using MaxentClassifier






from nltk.classify import MaxentClassifier
from nltk.corpus import treebank
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
  
tagger = ClassifierBasedPOSTagger(
        train = train_sents, classifier_builder = MaxentClassifier.train)
  
a = tagger.evaluate(test_data)
  
print ("Accuracy : ", a)

Output :

Accuracy : 0.9258363911072739

custom feature detector detecting features
There are two ways to do it:

  1. Subclass ClassifierBasedTagger and implement a feature_detector() method.
  2. Pass a function as the feature_detector keyword argument into ClassifierBasedTagger at initialization.

Code #3 : Custom Feature Detector




from nltk.tag.sequential import ClassifierBasedTagger
from tag_util import unigram_feature_detector
from nltk.corpus import treebank
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
tag = ClassifierBasedTagger(
        train = train_data, 
        feature_detector = unigram_feature_detector)
  
a = tagger.evaluate(test_data)
  
print ("Accuracy : ", a)

Output :

Accuracy : 0.8733865745737104

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up
Recommended Articles
Page :