NLP | Classifier-based Chunking | Set 2

Using the data from the treebank_chunk corpus let us evaluate the chunkers (prepared in the previous article).

Code #1 :

filter_none

edit
close

play_arrow

link
brightness_4
code

# loading libraries
from chunkers import ClassifierChunker
from nltk.corpus import treebank_chunk
  
train_data = treebank_chunk.chunked_sents()[:3000]
test_data = treebank_chunk.chunked_sents()[3000:]
  
# initializing
chunker = ClassifierChunker(train_data)
  
# evaluation
score = chunker.evaluate(test_data)
  
a = score.accuracy()
p = score.precision()
r = recall
    
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)

chevron_right


Output :

Accuracy of ClassifierChunker : 0.9721733155838022

Precision of ClassifierChunker : 0.9258838793383068

Recall of ClassifierChunker : 0.9359016393442623

 
Code #2 : Let’s compare the performance of conll_train

filter_none

edit
close

play_arrow

link
brightness_4
code

chunker = ClassifierChunker(conll_train)
score = chunker.evaluate(conll_test)
  
a = score.accuracy()
p = score.precision()
r = score.recall()
    
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)

chevron_right


Output :



Accuracy of ClassifierChunker : 0.9264622074002153

Precision of ClassifierChunker : 0.8737924310910219

Recall of ClassifierChunker : 0.9007354620620346

the word can be passed through the tagger into our feature detector function, by creating nested 2-tuples of the form ((word, pos), iob), The chunk_trees2train_chunks() method produces these nested 2-tuples.
The following features are extracted:

  • The current word and part-of-speech tag
  • The previous word and IOB tag, part-of-speech tag
  • The next word and part-of-speech tag

The ClassifierChunker class uses an internal ClassifierBasedTagger and prev_next_pos_iob() as its default feature_detector. The results from the tagger, which are in the same nested 2-tuple form, are then reformated into 3-tuples to return a final Tree using conlltags2tree().
 

Code #3 : different classifier builder

filter_none

edit
close

play_arrow

link
brightness_4
code

# loading libraries
from chunkers import ClassifierChunker
from nltk.corpus import treebank_chunk
from nltk.classify import MaxentClassifier
  
train_data = treebank_chunk.chunked_sents()[:3000]
test_data = treebank_chunk.chunked_sents()[3000:]
  
  
builder = lambda toks: MaxentClassifier.train(
            toks, trace = 0, max_iter = 10, min_lldelta = 0.01)
  
chunker = ClassifierChunker(
        train_data, classifier_builder = builder)
  
score = chunker.evaluate(test_data)
    
a = score.accuracy()
p = score.precision()
r = score.recall()
  
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)

chevron_right


Output :

Accuracy of ClassifierChunker : 0.9743204362949285

Precision of ClassifierChunker : 0.9334423548650859

Recall of ClassifierChunker : 0.9357377049180328

ClassifierBasedTagger class defaults to using NaiveBayesClassifier.train as its classifier_builder. But any classifier can be used by overriding the classifier_builder keyword argument.

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.