Self Named entity chunker can be trained using the ieer corpus, which stands for Information Extraction: Entity Recognition. The ieer corpus has chunk trees but no part-of-speech tags for the words, so it is a bit tedious job to perform.
Named entity chunk trees can be created from ieer corpus using the
ieer_chunked_sents() functions. This can be used to train the
ClassifierChunker class created in the Classification-based chunking.
Code #1 : ieertree2conlltags()
Code #2 : ieer_chunked_sents()
Using 80 out of 94 sentences for training and the remaining ones for testing.
Code #3 : How the classifier works on the first sentence of the treebank_chunk corpus.
Length of ieer_chunks : 94 parsing : Tree('S', [Tree('LOCATION', [('Pierre', 'NNP'), ('Vinken', 'NNP')]), (', ', ', '), Tree('DURATION', [('61', 'CD'), ('years', 'NNS')]), Tree('MEASURE', [('old', 'JJ')]), (', ', ', '), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), Tree('DATE', [('Nov.', 'NNP'), ('29', 'CD')]), ('.', '.')]) Accuracy : 0.8829018388070625 Precision : 0.4088717454194793 Recall : 0.5053635280095352
How it works ?
The ieer trees generated by ieer_chunked_sents() are not entirely accurate. There are no explicit sentence breaks, so each document is a single tree. Also, the words are not explicitly tagged, it’s guess work using nltk.tag.pos_tag().
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- NLP | Training Tagger Based Chunker | Set 1
- NLP | Training Tagger Based Chunker | Set 2
- Python | Named Entity Recognition (NER) using spaCy
- NLP | Training Unigram Tagger
- NLP | Training a tokenizer and filtering stopwords in a sentence
- NLP | Extracting Named Entities
- Implementing Artificial Neural Network training process in Python
- ML | Training Image Classifier using Tensorflow Object Detection API
- Python | Pokémon Training Game
- Python - Random Sample Training and Test Data from dictionary
- Visualizing training with TensorBoard
- HTML Cleaning and Entity Conversion | Python
- NLP | Classifier-based Chunking | Set 2
- Processing text using NLP | Basics
- Readability Index in Python(NLP)
- Feature Extraction Techniques - NLP
- Python | NLP analysis of Restaurant reviews
- Applying Multinomial Naive Bayes to NLP Problems
- NLP | Chunking and chinking with RegEx
- NLP | Synsets for a word in WordNet
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.