Related Articles

Related Articles

NLP | WordNet for tagging
  • Last Updated : 18 Dec, 2019

WordNet is the lexical database i.e. dictionary for the English language, specifically designed for natural language processing.

Code #1 : Creating class to look up words in WordNet.

filter_none

edit
close

play_arrow

link
brightness_4
code

from nltk.tag import SequentialBackoffTagger
from nltk.corpus import wordnet
from nltk.probability import FreqDist
  
class WordNetTagger(SequentialBackoffTagger):
      
    '''
    >>> wt = WordNetTagger()
    >>> wt.tag(['food', 'is', 'great'])
    [('food', 'NN'), ('is', 'VB'), ('great', 'JJ')]
    '''
      
    def __init__(self, *args, **kwargs):
          
        SequentialBackoffTagger.__init__(self, *args, **kwargs)
        self.wordnet_tag_map = {
        'n': 'NN',
        's': 'JJ',
        'a': 'JJ',
        'r': 'RB',
        'v': 'VB'
        }
      
    def choose_tag(self, tokens, index, history):
          
    word = tokens[index]
    fd = FreqDist()
      
    for synset in wordnet.synsets(word):
        fd[synset.pos()] += 1
          
          
    return self.wordnet_tag_map.get(fd.max())

chevron_right


This WordNetTagger class will count the no. of each POS tag found in the Synsets for a word and then, the most common tag is to treebank tag using internal mapping.

Code #2 : Using a simple WordNetTagger()



filter_none

edit
close

play_arrow

link
brightness_4
code

from taggers import WordNetTagger
from nltk.corpus import treebank
  
# Initializing
default_tag = DefaultTagger('NN')
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
wn_tagging = WordNetTagger()
a = wn_tagger.evaluate(test_data)
  
print ("Accuracy of WordNetTagger : ", a)

chevron_right


Output :

Accuracy of WordNetTagger : 0.17914876598160262

Using Code 3, we can improve the accuracy.
Code #3 : WordNetTagger class at the end of an NgramTagger backoff chain

filter_none

edit
close

play_arrow

link
brightness_4
code

from taggers import WordNetTagger
from nltk.corpus import treebank
from tag_util import backoff_tagger
from nltk.tag import UnigramTagger, BigramTagger, TrigramTagger
  
# Initializing
default_tag = DefaultTagger('NN')
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
tagger = backoff_tagger(train_data,
                        [UnigramTagger, BigramTagger,
                         TrigramTagger], backoff = wn_tagger)
      
a = tagger.evaluate(test_data)
  
print ("Accuracy : ", a)

chevron_right


Output :

Accuracy : 0.8848262464925534

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up
Recommended Articles
Page :