NLP | Classifier-based tagging
ClassifierBasedPOSTagger class:
- It is a subclass of ClassifierBasedTagger that uses classification technique to do part-of-speech tagging.
- From the words, features are extracted and then passed to an internal classifier.
- It classifies the features and returns a label i.e. a part-of-speech tag.
- The feature detector finds multiple length suffixes, does some regular expression matching, and looks at the unigram, bigram, and trigram history to produce a fairly complete set of features for each word
Code #1 : Using ClassifierBasedPOSTagger
from nltk.tag.sequential import ClassifierBasedPOSTagger from nltk.corpus import treebank # initializing training and testing set train_data = treebank.tagged_sents()[: 3000 ] test_data = treebank.tagged_sents()[ 3000 :] tagging = ClassifierBasedPOSTagger(train = train_data) a = tagging.evaluate(test_data) print ( "Accuracy : " , a) |
Output :
Accuracy : 0.9309734513274336
ClassifierBasedPOSTagger class inherits from ClassifierBasedTagger and only implements a feature_detector() method. All the training and tagging is done in ClassifierBasedTagger.
Code #2 : Using MaxentClassifier
from nltk.classify import MaxentClassifier from nltk.corpus import treebank # initializing training and testing set train_data = treebank.tagged_sents()[: 3000 ] test_data = treebank.tagged_sents()[ 3000 :] tagger = ClassifierBasedPOSTagger( train = train_sents, classifier_builder = MaxentClassifier.train) a = tagger.evaluate(test_data) print ( "Accuracy : " , a) |
Output :
Accuracy : 0.9258363911072739
custom feature detector detecting features
There are two ways to do it:
- Subclass ClassifierBasedTagger and implement a feature_detector() method.
- Pass a function as the feature_detector keyword argument into ClassifierBasedTagger at initialization.
Code #3 : Custom Feature Detector
from nltk.tag.sequential import ClassifierBasedTagger from tag_util import unigram_feature_detector from nltk.corpus import treebank # initializing training and testing set train_data = treebank.tagged_sents()[: 3000 ] test_data = treebank.tagged_sents()[ 3000 :] tag = ClassifierBasedTagger( train = train_data, feature_detector = unigram_feature_detector) a = tagger.evaluate(test_data) print ( "Accuracy : " , a) |
Output :
Accuracy : 0.8733865745737104
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.