NLP | Classifier-based Chunking | Set 2

Using the data from the treebank_chunk corpus let us evaluate the chunkers (prepared in the previous article).

Code #1 :

filter_none

edit
close

play_arrow

link
brightness_4
code

# loading libraries
from chunkers import ClassifierChunker
from nltk.corpus import treebank_chunk
  
train_data = treebank_chunk.chunked_sents()[:3000]
test_data = treebank_chunk.chunked_sents()[3000:]
  
# initializing
chunker = ClassifierChunker(train_data)
  
# evaluation
score = chunker.evaluate(test_data)
  
a = score.accuracy()
p = score.precision()
r = recall
    
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)

chevron_right


Output :

Accuracy of ClassifierChunker : 0.9721733155838022

Precision of ClassifierChunker : 0.9258838793383068

Recall of ClassifierChunker : 0.9359016393442623

 
Code #2 : Let’s compare the performance of conll_train

filter_none

edit
close

play_arrow

link
brightness_4
code

chunker = ClassifierChunker(conll_train)
score = chunker.evaluate(conll_test)
  
a = score.accuracy()
p = score.precision()
r = score.recall()
    
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)

chevron_right


Output :

Accuracy of ClassifierChunker : 0.9264622074002153

Precision of ClassifierChunker : 0.8737924310910219

Recall of ClassifierChunker : 0.9007354620620346

the word can be passed through the tagger into our feature detector function, by creating nested 2-tuples of the form ((word, pos), iob), The chunk_trees2train_chunks() method produces these nested 2-tuples.
The following features are extracted:

  • The current word and part-of-speech tag
  • The previous word and IOB tag, part-of-speech tag
  • The next word and part-of-speech tag

The ClassifierChunker class uses an internal ClassifierBasedTagger and prev_next_pos_iob() as its default feature_detector. The results from the tagger, which are in the same nested 2-tuple form, are then reformated into 3-tuples to return a final Tree using conlltags2tree().
 

Code #3 : different classifier builder

filter_none

edit
close

play_arrow

link
brightness_4
code

# loading libraries
from chunkers import ClassifierChunker
from nltk.corpus import treebank_chunk
from nltk.classify import MaxentClassifier
  
train_data = treebank_chunk.chunked_sents()[:3000]
test_data = treebank_chunk.chunked_sents()[3000:]
  
  
builder = lambda toks: MaxentClassifier.train(
            toks, trace = 0, max_iter = 10, min_lldelta = 0.01)
  
chunker = ClassifierChunker(
        train_data, classifier_builder = builder)
  
score = chunker.evaluate(test_data)
    
a = score.accuracy()
p = score.precision()
r = score.recall()
  
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)

chevron_right


Output :

Accuracy of ClassifierChunker : 0.9743204362949285

Precision of ClassifierChunker : 0.9334423548650859

Recall of ClassifierChunker : 0.9357377049180328

ClassifierBasedTagger class defaults to using NaiveBayesClassifier.train as its classifier_builder. But any classifier can be used by overriding the classifier_builder keyword argument.



My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.