NLP | WordNet for tagging

WordNet is the lexical database i.e. dictionary for the English language, specifically designed for natural language processing.

Code #1 : Creating class to look up words in WordNet.



filter_none

edit
close

play_arrow

link
brightness_4
code

from nltk.tag import SequentialBackoffTagger
from nltk.corpus import wordnet
from nltk.probability import FreqDist
  
class WordNetTagger(SequentialBackoffTagger):
      
    '''
    >>> wt = WordNetTagger()
    >>> wt.tag(['food', 'is', 'great'])
    [('food', 'NN'), ('is', 'VB'), ('great', 'JJ')]
    '''
      
    def __init__(self, *args, **kwargs):
          
        SequentialBackoffTagger.__init__(self, *args, **kwargs)
        self.wordnet_tag_map = {
        'n': 'NN',
        's': 'JJ',
        'a': 'JJ',
        'r': 'RB',
        'v': 'VB'
        }
      
    def choose_tag(self, tokens, index, history):
          
    word = tokens[index]
    fd = FreqDist()
      
    for synset in wordnet.synsets(word):
        fd[synset.pos()] += 1
          
          
    return self.wordnet_tag_map.get(fd.max())

chevron_right


This WordNetTagger class will count the no. of each POS tag found in the Synsets for a word and then, the most common tag is to treebank tag using internal mapping.

Code #2 : Using a simple WordNetTagger()

filter_none

edit
close

play_arrow

link
brightness_4
code

from taggers import WordNetTagger
from nltk.corpus import treebank
  
# Initializing
default_tag = DefaultTagger('NN')
  
# intializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
wn_tagging = WordNetTagger()
a = wn_tagger.evaluate(test_data)
  
print ("Accuracy of WordNetTagger : ", a)

chevron_right


Output :

Accuracy of WordNetTagger : 0.17914876598160262

Using Code 3, we can improve the accuracy.
Code #3 : WordNetTagger class at the end of an NgramTagger backoff chain

filter_none

edit
close

play_arrow

link
brightness_4
code

from taggers import WordNetTagger
from nltk.corpus import treebank
from tag_util import backoff_tagger
from nltk.tag import UnigramTagger, BigramTagger, TrigramTagger
  
# Initializing
default_tag = DefaultTagger('NN')
  
# intializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
tagger = backoff_tagger(train_data,
                        [UnigramTagger, BigramTagger,
                         TrigramTagger], backoff = wn_tagger)
      
a = tagger.evaluate(test_data)
  
print ("Accuracy : ", a)

chevron_right


Output :

Accuracy : 0.8848262464925534


My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.