Open In App

NLP | Backoff Tagging to combine taggers

What is Part-of-speech (POS) tagging ? It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on. What is Backoff Tagging? It is one of the most important features of SequentialBackoffTagger as it allows to combine the taggers together. The advantage of doing this is that if a tagger doesn’t know about the tagging of a word, then it can pass this tagging task to the next backoff tagger. If that one can’t do it, it can pass the word on to the next backoff tagger, and so on until there are no backoff taggers left to check. Code #1 : Performing tagging 




# Loading Libraries
from nltk.tag import SequentialBackoffTagger
from nltk.tag import DefaultTagger
from nltk.tag import UnigramTagger 
 
from nltk.corpus import treebank
 
# initializing training and testing set   
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
 
# Defining Tag
tag1 = DefaultTagger('NN')
 
# Tagging
tag2 = UnigramTagger(train_data, backoff = tag1)
 
# Evaluation
tag2.evaluate(test_data)

Output : 



0.8752428232246924

How it works ? SequentialBackoffTagger class can take a backoff keyword argument whose value is another instance of a SequentialBackoffTagger. In the code above, unigram part-of-speech tagger is backoff with Default tagger and trained on treebank.tagged_sents() dataset.   Code #2 : Preparing internal list of backoff taggers 




from nltk.tag import SequentialBackoffTagger
 
print (tag1._taggers == [tag1])
 
print ("\n", tag2._taggers == [tag2, tag1])

Output : 



True

True

How it works ? 

Code #3 : Saving and loading a trained tagger with pickle. 




# Loading Libraries
import pickle
 
# Opening file and writing
file = open('tagger.pickle', 'wb')
pickle.dump(tagger, file)
file.close()
 
# Reading file
file = open('tagger.pickle', 'rb')
# Loading
tagger = pickle.load(f)

Output : 

nltk.data.load('tagger.pickle') will load the file 

Article Tags :