To train a chunker is an alternative to manually specifying regular expression (regex) chunk patterns. But manually training to specify the expression is a tedious task to do as it follows the hit and trial method to get the exact right patterns. So, existing corpus data can be used to train chunkers.
In the codes below, we are using treebank_chunk corpus to produce chunked sentences in the form of trees.
-> To train a tagger-based chunker – chunked_sents() methods are used by a TagChunker class.
-> To extract a list of (pos, iob) tuples from a list of Trees – the TagChunker class uses a helper function, conll_tag_chunks().
These tuples are then finally used to train a tagger. and it learns IOB tags for part-of-speech tags.
Code #1 : Let’s understand the Chunker class for training.
Code #2 : Using the Tag Chunker.
Code #3 : Evaluating the TagChunker
Accuracy of TagChunker : 0.9732039335251428 Precision of TagChunker : 0.9166534370535006 Recall of TagChunker : 0.9465573770491803
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course