NLP | Classifier-based Chunking | Set 1

The ClassifierBasedTagger class learns from the features, unlike most part-of-speech taggers. ClassifierChunker class can be created such that it can learn from both the words and part-of-speech tags, instead of just from the part-of-speech tags as the TagChunker class does.

The (word, pos, iob) 3-tuples is converted into ((word, pos), iob) 2-tuples using the chunk_trees2train_chunks() from tree2conlltags(), to remain compatible with the 2-tuple (word, pos) format required for training a ClassiferBasedTagger class.
 
Code #1 : Let’s understand

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading Libraries
from nltk.chunk import ChunkParserI
from nltk.chunk.util import tree2conlltags, conlltags2tree
from nltk.tag import ClassifierBasedTagger
  
def chunk_trees2train_chunks(chunk_sents):
  
    # Using tree2conlltags
    tag_sents = [tree2conlltags(sent) for 
                 sent in chunk_sents]
  
    3-tuple is converted to 2-tuple
    return [[((w, t), c) for 
             (w, t, c) in sent] for sent in tag_sents]

chevron_right


Now, a feature detector function is needed to pass into ClassifierBasedTagger. Any feature detector function used with the ClassifierChunker class (defined next) should recognize that tokens are a list of (word, pos) tuples, and have the same function signature as prev_next_pos_iob(). To give the classifier as much information as we can, this feature set contains the current, previous, and next word and part-of-speech tag, along with the previous IOB tag.
 



Code #2 : detector function

filter_none

edit
close

play_arrow

link
brightness_4
code

def prev_next_pos_iob(tokens, index, history):
      
    word, pos = tokens[index]
    if index == 0:
        prevword, prevpos, previob = ('<START>', )*3
    else:
        prevword, prevpos = tokens[index-1]
        previob = history[index-1]
          
    if index == len(tokens) - 1:
        nextword, nextpos = ('<END>', )*2
    else:
        nextword, nextpos = tokens[index + 1]
        feats = {'word': word,
                 'pos': pos,
                 'nextword': nextword,
                 'nextpos': nextpos,
                 'prevword': prevword,
                 'prevpos': prevpos,
                 'previob': previob
                 }
    return feats

chevron_right


Now, ClassifierChunker class is need which uses an internal ClassifierBasedTagger with training sentences from chunk_trees2train_chunks() and features extracted using prev_next_pos_iob(). As a subclass of ChunkerParserI, ClassifierChunker implements the parse() method to convert the ((w, t), c) tuples, produced by the internal tagger into Trees using conlltags2tree()

Code #3 :

filter_none

edit
close

play_arrow

link
brightness_4
code

class ClassifierChunker(ChunkParserI):
    def __init__(self, train_sents, 
                 feature_detector = prev_next_pos_iob, **kwargs):
          
        if not feature_detector:
            feature_detector = self.feature_detector
            train_chunks = chunk_trees2train_chunks(train_sents)
            self.tagger = ClassifierBasedTagger(train = train_chunks,
            feature_detector = feature_detector, **kwargs)
              
    def parse(self, tagged_sent):
          
        if not tagged_sent: return None
        chunks = self.tagger.tag(tagged_sent)
          
        return conlltags2tree(
                [(w, t, c) for ((w, t), c) in chunks])

chevron_right




My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.