What is Part-of-speech (POS) tagging ? It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on.
# Loading Libraries from nltk.tag import SequentialBackoffTagger
from nltk.tag import DefaultTagger
from nltk.tag import UnigramTagger
from nltk.corpus import treebank
# initializing training and testing set train_data = treebank.tagged_sents()[: 3000 ]
test_data = treebank.tagged_sents()[ 3000 :]
# Defining Tag tag1 = DefaultTagger( 'NN' )
# Tagging tag2 = UnigramTagger(train_data, backoff = tag1)
# Evaluation tag2.evaluate(test_data) |
Output :
0.8752428232246924
How it works ? SequentialBackoffTagger class can take a backoff keyword argument whose value is another instance of a SequentialBackoffTagger. In the code above, unigram part-of-speech tagger is backoff with Default tagger and trained on treebank.tagged_sents() dataset. Code #2 : Preparing internal list of backoff taggers
from nltk.tag import SequentialBackoffTagger
print (tag1._taggers = = [tag1])
print ("\n", tag2._taggers = = [tag2, tag1])
|
Output :
True True
How it works ?
- SequentialBackoffTagger class is initialized, creating an internal list of the backoff taggers with first element being itself.
- The backoff tagger’s internal list of taggers is appended if a backoff tagger is given.
- SequentialBackoffTagger class uses _taggers list is the internal list of backoff taggers when the tag() method is called.
- Calling choose_tag() on each one of them, it goes through its list of taggers.
- It stops and returns the tag when a tag is found.
- The tag will be returned if primary tagger can tag the word.
- Else, it returns None and the next tagger is tried, and so on until a tag is found, or else None is returned.
Code #3 : Saving and loading a trained tagger with pickle.
# Loading Libraries import pickle
# Opening file and writing file = open ( 'tagger.pickle' , 'wb' )
pickle.dump(tagger, file )
file .close()
# Reading file file = open ( 'tagger.pickle' , 'rb' )
# Loading tagger = pickle.load(f)
|
Output :
nltk.data.load('tagger.pickle') will load the file