NLP | Brill Tagger

  • BrillTagger class is a transformation-based tagger. It is not a subclass of SequentialBackoffTagger.
  • Moreover, it uses a series of rules to correct the results of an initial tagger.
  • These rules it follows are scored based. This score is equal to the no. of errors they correct minus the no. of new errors they produce.

Code #1 : Training a BrillTagger class

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries
from nltk.tag import brill, brill_trainer
  
def train_brill_tagger(initial_tagger, train_sents, **kwargs):
    templates = [
            brill.Template(brill.Pos([-1])),
            brill.Template(brill.Pos([1])),
            brill.Template(brill.Pos([-2])),
            brill.Template(brill.Pos([2])),
            brill.Template(brill.Pos([-2, -1])),
            brill.Template(brill.Pos([1, 2])),
            brill.Template(brill.Pos([-3, -2, -1])),
            brill.Template(brill.Pos([1, 2, 3])),
            brill.Template(brill.Pos([-1]), brill.Pos([1])),
            brill.Template(brill.Word([-1])),
            brill.Template(brill.Word([1])),
            brill.Template(brill.Word([-2])),
            brill.Template(brill.Word([2])),
            brill.Template(brill.Word([-2, -1])),
            brill.Template(brill.Word([1, 2])),
            brill.Template(brill.Word([-3, -2, -1])),
            brill.Template(brill.Word([1, 2, 3])),
            brill.Template(brill.Word([-1]), brill.Word([1])),
            ]
      
    # USing BrillTaggerTrainer to train 
    trainer = brill_trainer.BrillTaggerTrainer(
            initial_tagger, templates, deterministic = True)
      
    return trainer.train(train_sents, **kwargs)

chevron_right


 
Code #2 : Let’s use the trained BrillTagger



filter_none

edit
close

play_arrow

link
brightness_4
code

from nltk.tag import brill, brill_trainer
from nltk.tag import DefaultTagger
from nltk.corpus import treebank
from tag_util import train_brill_tagger
  
# Initializing
default_tag = DefaultTagger('NN')
  
# intializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
initial_tag = backoff_tagger(
        train_data, [UnigramTagger, BigramTagger, 
                    TrigramTagger], backoff = default_tagger)
      
a = initial_tag.evaluate(test_data)
print ("Accuracy of Initial Tag : ", a)

chevron_right


Output :

Accuracy of Initial Tag : 0.8806820634578028

 
Code #3 :

filter_none

edit
close

play_arrow

link
brightness_4
code

brill_tag = train_brill_tagger(initial_tag, train_data)
b = brill_tag.evaluate(test_data)
  
print ("Accuracy of brill_tag : ", b)

chevron_right


Output :

Accuracy of brill_tag : 0.8827541549751781


My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.



Improved By : Akanksha_Rai