Conll2000 corpus defines the chunks using IOB tags.
- It specifies where the chunk begins and end, along with its types.
- A part-of-speech tagger can be trained on these IOB tags to further power a ChunkerI subclass.
- First using
chunked_sents()method os corpus, tree is obtained and is then transformed to a format usable by a part-of-speech tagger.
tree2conlltags()to convert a sentence Tree into a list of three tuples of the form (word, pos, iob).
- pos : part-of-speech tag
- iob : IOB tag for example – B_NP, I_NP to tell that work is in the beginning and inside the noun phrase respectively.
conlltags2tree()is reversal of
- 3-tuples are then converted into 2-tuples that the tagger can recognize.
- RegexpParser class uses part-of-speech tags for chunk patterns, so part-of-speech tags are used as if they were words to tag.
conll_tag_chunks()function takes 3-tuples (word, pos, iob) and returns a list of 2-tuples of the form (pos, iob)
Code #1 : Let’s understand
Tree2conlltags : [('the', 'DT', 'B-NP'), ('book', 'NN', 'I-NP')] conlltags2tree : Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')])]) conll_tag_chunnks for tree : [[('DT', 'B-NP'), ('NN', 'I-NP')]]
Code #2 : TagChunker class using the conll2000 corpus
Accuracy of TagChunker : 0.8950545623403762 Precision of TagChunker : 0.8114841974355675 Recall of TagChunker : 0.8644191676944863
Note : The performance of conll2000 is not too good as treebank_chunk but conll2000 is a much larger corpus.
Code #3 : TagChunker using UnigramTagger Class
Accuracy of TagChunker : 0.9674925924335466
tagger_classes argument is passed directly to the backoff_tagger() function, so that means they must be subclasses of SequentialBackoffTagger. In testing, the default of tagger_classes = [UnigramTagger, BigramTagger] generally produces the best results, but it can vary with different corpuses.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- NLP | Training Tagger Based Chunker | Set 1
- NLP | Named Entity Chunker Training
- NLP | Training Unigram Tagger
- NLP | Brill Tagger
- NLP | Training a tokenizer and filtering stopwords in a sentence
- NLP | Classifier-based Chunking | Set 2
- NLP | Classifier-based Chunking | Set 1
- NLP | Classifier-based tagging
- Implementing Artificial Neural Network training process in Python
- ML | Training Image Classifier using Tensorflow Object Detection API
- Python | Pokémon Training Game
- Python - Random Sample Training and Test Data from dictionary
- Visualizing training with TensorBoard
- Training Neural Networks using Pytorch Lightning
- Processing text using NLP | Basics
- Readability Index in Python(NLP)
- Feature Extraction Techniques - NLP
- Python | NLP analysis of Restaurant reviews
- Applying Multinomial Naive Bayes to NLP Problems
- NLP | Chunking and chinking with RegEx
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : Akanksha_Rai