NLP | Classifier-based Chunking | Set 1
ClassifierBasedTagger class learns from the features, unlike most part-of-speech taggers.
ClassifierChunker class can be created such that it can learn from both the words and part-of-speech tags, instead of just from the part-of-speech tags as the
TagChunker class does.
The (word, pos, iob) 3-tuples is converted into ((word, pos), iob) 2-tuples using the
tree2conlltags(), to remain compatible with the 2-tuple (word, pos) format required for training a
Code #1 : Let’s understand
Now, a feature detector function is needed to pass into ClassifierBasedTagger. Any feature detector function used with the ClassifierChunker class (defined next) should recognize that tokens are a list of (word, pos) tuples, and have the same function signature as prev_next_pos_iob(). To give the classifier as much information as we can, this feature set contains the current, previous, and next word and part-of-speech tag, along with the previous IOB tag.
Code #2 : detector function
ClassifierChunker class is need which uses an internal
ClassifierBasedTagger with training sentences from
chunk_trees2train_chunks() and features extracted using
prev_next_pos_iob(). As a subclass of
ClassifierChunker implements the
parse() method to convert the ((w, t), c) tuples, produced by the internal tagger into Trees using
Code #3 :