NLP | Training Unigram Tagger

A single token is referred to as a Unigram, for example – hello; movie; coding. This article is focussed on unigram tagger.

Unigram Tagger: For determining the Part of Speech tag, it only uses a single word. UnigramTagger inherits from NgramTagger, which is a subclass of ContextTagger, which inherits from SequentialBackoffTagger. So, UnigramTagger is a single word context-based tagger.



Code #1 : Training UnigramTagger.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries
from nltk.tag import UnigramTagger
from nltk.corpus import treebank

chevron_right


 
Code #2 : Training using first 1000 tagged sentences of the treebank corpus as data.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Using data
train_sents = treebank.tagged_sents()[:1000]
  
# Initializing
tagger = UnigramTagger(train_sents)
  
# Lets see the first sentence 
# (of the treebank corpus) as list   
treebank.sents()[0]

chevron_right


Output :

['Pierre',
 'Vinken',
 ', ',
 '61',
 'years',
 'old',
 ', ',
 'will',
 'join',
 'the',
 'board',
 'as',
 'a',
 'nonexecutive',
 'director',
 'Nov.',
 '29',
 '.']

 
Code #3 : Finding the tagged results after training.

filter_none

edit
close

play_arrow

link
brightness_4
code

tagger.tag(treebank.sents()[0])

chevron_right


Output :

[('Pierre', 'NNP'),
 ('Vinken', 'NNP'),
 (', ', ', '),
 ('61', 'CD'),
 ('years', 'NNS'),
 ('old', 'JJ'),
 (', ', ', '),
 ('will', 'MD'),
 ('join', 'VB'),
 ('the', 'DT'),
 ('board', 'NN'),
 ('as', 'IN'),
 ('a', 'DT'),
 ('nonexecutive', 'JJ'),
 ('director', 'NN'),
 ('Nov.', 'NNP'),
 ('29', 'CD'),
 ('.', '.')]

 
How does the code work?
UnigramTagger builds a context model from the list of tagged sentences. Because UnigramTagger inherits from ContextTagger, instead of providing a choose_tag() method, it must implement a context() method, which takes the same three arguments a choose_tag(). The context token is used to create the model, and also to look up the best tag once the model is created. This is explained graphically in the above diagram also.

Overriding the context model –
All taggers, inherited from ContextTagger instead of training their own model can take a pre-built model. This model is simply a Python dictionary mapping a context key to a tag. The context keys (individual words in case of UnigramTagger) will depend on what the ContextTagger subclass returns from its context() method.
 
Code #4 : Overriding the context model

filter_none

edit
close

play_arrow

link
brightness_4
code

tagger = UnigramTagger(model ={'Pierre': 'NN'})
  
tagger.tag(treebank.sents()[0])

chevron_right


Output :

[('Pierre', 'NN'),
 ('Vinken', None),
 (', ', None),
 ('61', None),
 ('years', None),
 ('old', None),
 (', ', None),
 ('will', None),
 ('join', None),
 ('the', None),
 ('board', None),
 ('as', None),
 ('a', None),
 ('nonexecutive', None),
 ('director', None),
 ('Nov.', None),
 ('29', None),
 ('.', None)]


My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.