NLP | Backoff Tagging to combine taggers

Whats is Part-of-speech (POS) tagging ?
It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on.

What is Backoff Tagging ?
It is one of the modt important features of SequentialBackoffTagger as it allows to combine the taggers together. The advantage of doing this is that if a tagger doesn’t know about the tagging of a word, then it can pass this tagging task to the next backoff tagger. If that one can’t do it, it can pass the word on to the
next backoff tagger, and so on until there are no backoff taggers left to check.

Code #1 : Performing tagging



filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries 
from nltk.tag import SequentialBackoffTagger
from nltk.tag import DefaultTagger 
from nltk.tag import UnigramTagger  
  
from nltk.corpus import treebank
  
# intializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# Defining Tag 
tag1 = DefaultTagger('NN')
  
# Tagging
tag2 = UnigramTagger(train_data, backoff = tag1)
  
# Evaluation
tag2.evaluate(test_data)

chevron_right


Output :

0.8752428232246924

How it works ?
SequentialBackoffTagger class can take a backoff keyword argument whose value is another instance of a SequentialBackoffTagger. In the code above, unigram part-of-speech tagger is backoff with Default tagger and trained on treebank.tagged_sents() dataset.
 
Code #2 : Preparing internal list of backoff taggers

filter_none

edit
close

play_arrow

link
brightness_4
code

from nltk.tag import SequentialBackoffTagger
  
print (tag1._taggers == [tag1])
  
print ("\n", tag2._taggers == [tag2, tag1])

chevron_right


Output :

True

True

How it works ?

  • SequentialBackoffTagger class is initialized, creating an internal list of the backoff taggers with first element being itself.
  • The backoff tagger’s internal list of taggers is appended if a backoff tagger is given.
  • SequentialBackoffTagger class uses _taggers list is the internal list of backoff taggers when the tag() method is called.
  • Calling choose_tag() on each one of them, it goes through its list of taggers.
  • It stops and returns the tag when a tag is found.
  • The tag will be returned if primary tagger can tag the word.
  • Else, it returns None and the next tagger is tried, and so on until a tag is found, or else None is returned.

Code #3 : Saving and loading a trained tagger with pickle.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries
import pickle
  
# Opening file and writing
file = open('tagger.pickle', 'wb')
pickle.dump(tagger, file)
file.close()
  
# Reading file
file = open('tagger.pickle', 'rb')
# Loading
tagger = pickle.load(f)

chevron_right


Output :

nltk.data.load('tagger.pickle') will load the file 


My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.