To train a chunker is an alternative to manually specifying regular expression (regex) chunk patterns. But manually training to specify the expression is a tedious task to do as it follows the hit and trial method to get the exact right patterns. So, existing corpus data can be used to train chunkers.
In the codes below, we are using treebank_chunk corpus to produce chunked sentences in the form of trees.
-> To train a tagger-based chunker – chunked_sents() methods are used by a TagChunker class.
-> To extract a list of (pos, iob) tuples from a list of Trees – the TagChunker class uses a helper function, conll_tag_chunks().
These tuples are then finally used to train a tagger. and it learns IOB tags for part-of-speech tags.
Code #1 : Let’s understand the Chunker class for training.
Code #2 : Using the Tag Chunker.
Code #3 : Evaluating the TagChunker
Accuracy of TagChunker : 0.9732039335251428 Precision of TagChunker : 0.9166534370535006 Recall of TagChunker : 0.9465573770491803
- NLP | Training Tagger Based Chunker | Set 2
- NLP | Training Unigram Tagger
- NLP | Named Entity Chunker Training
- NLP | Brill Tagger
- NLP | Training a tokenizer and filtering stopwords in a sentence
- Implementing Artificial Neural Network training process in Python
- NLP | Classifier-based tagging
- NLP | Classifier-based Chunking | Set 2
- NLP | Classifier-based Chunking | Set 1
- Image based Steganography using Python
- Thread-based parallelism in Python
- ML | Momentum-based Gradient Optimizer introduction
- Python | Increment 1's in list based on pattern
- Movie recommendation based on emotion in Python
- Selecting rows in pandas DataFrame based on conditions
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.