NLP | Classifier-based tagging
ClassifierBasedPOSTagger class:
- It is a subclass of ClassifierBasedTagger that uses classification technique to do part-of-speech tagging.
- From the words, features are extracted and then passed to an internal classifier.
- It classifies the features and returns a label i.e. a part-of-speech tag.
- The feature detector finds multiple length suffixes, does some regular expression matching, and looks at the unigram, bigram, and trigram history to produce a fairly complete set of features for each word
Code #1 : Using ClassifierBasedPOSTagger
from nltk.tag.sequential import ClassifierBasedPOSTagger from nltk.corpus import treebank # initializing training and testing set train_data = treebank.tagged_sents()[: 3000 ] test_data = treebank.tagged_sents()[ 3000 :] tagging = ClassifierBasedPOSTagger(train = train_data) a = tagging.evaluate(test_data) print ( "Accuracy : " , a) |
Output :
Accuracy : 0.9309734513274336
ClassifierBasedPOSTagger class inherits from ClassifierBasedTagger and only implements a feature_detector() method. All the training and tagging is done in ClassifierBasedTagger.
Code #2 : Using MaxentClassifier
from nltk.classify import MaxentClassifier from nltk.corpus import treebank # initializing training and testing set train_data = treebank.tagged_sents()[: 3000 ] test_data = treebank.tagged_sents()[ 3000 :] tagger = ClassifierBasedPOSTagger( train = train_sents, classifier_builder = MaxentClassifier.train) a = tagger.evaluate(test_data) print ( "Accuracy : " , a) |
Output :
Accuracy : 0.9258363911072739
custom feature detector detecting features
There are two ways to do it:
- Subclass ClassifierBasedTagger and implement a feature_detector() method.
- Pass a function as the feature_detector keyword argument into ClassifierBasedTagger at initialization.
Code #3 : Custom Feature Detector
from nltk.tag.sequential import ClassifierBasedTagger from tag_util import unigram_feature_detector from nltk.corpus import treebank # initializing training and testing set train_data = treebank.tagged_sents()[: 3000 ] test_data = treebank.tagged_sents()[ 3000 :] tag = ClassifierBasedTagger( train = train_data, feature_detector = unigram_feature_detector) a = tagger.evaluate(test_data) print ( "Accuracy : " , a) |
Output :
Accuracy : 0.8733865745737104
Please Login to comment...