A single token is referred to as a Unigram, for example – hello; movie; coding. This article is focussed on unigram tagger.
Unigram Tagger: For determining the Part of Speech tag, it only uses a single word.
UnigramTagger inherits from NgramTagger, which is a subclass of
ContextTagger, which inherits from
UnigramTagger is a single word context-based tagger.
Code #1 : Training UnigramTagger.
Code #2 : Training using first 1000 tagged sentences of the treebank corpus as data.
['Pierre', 'Vinken', ', ', '61', 'years', 'old', ', ', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.']
Code #3 : Finding the tagged results after training.
[('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]
How does the code work?
UnigramTagger builds a context model from the list of tagged sentences. Because UnigramTagger inherits from
ContextTagger, instead of providing a
choose_tag() method, it must implement a
context() method, which takes the same three arguments a
choose_tag(). The context token is used to create the model, and also to look up the best tag once the model is created. This is explained graphically in the above diagram also.
Overriding the context model –
All taggers, inherited from
ContextTagger instead of training their own model can take a pre-built model. This model is simply a Python dictionary mapping a context key to a tag. The context keys (individual words in case of UnigramTagger) will depend on what the
ContextTagger subclass returns from its
Code #4 : Overriding the context model
[('Pierre', 'NN'), ('Vinken', None), (', ', None), ('61', None), ('years', None), ('old', None), (', ', None), ('will', None), ('join', None), ('the', None), ('board', None), ('as', None), ('a', None), ('nonexecutive', None), ('director', None), ('Nov.', None), ('29', None), ('.', None)]
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- NLP | Training Tagger Based Chunker | Set 1
- NLP | Training Tagger Based Chunker | Set 2
- NLP | Brill Tagger
- NLP | Training a tokenizer and filtering stopwords in a sentence
- NLP | Named Entity Chunker Training
- Implementing Artificial Neural Network training process in Python
- ML | Training Image Classifier using Tensorflow Object Detection API
- Python | Pokémon Training Game
- Python - Random Sample Training and Test Data from dictionary
- Visualizing training with TensorBoard
- NLP | Classifier-based Chunking | Set 2
- Processing text using NLP | Basics
- Readability Index in Python(NLP)
- Feature Extraction Techniques - NLP
- Python | NLP analysis of Restaurant reviews
- Applying Multinomial Naive Bayes to NLP Problems
- NLP | Chunking and chinking with RegEx
- NLP | Synsets for a word in WordNet
- NLP | Part of Speech - Default Tagging
- NLP | Word Collocations
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.