NLP | Chunking and chinking with RegEx

Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence (tagged with Part-of-Speech).
Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can’t be a part of chuck and such words are known as chinks. A ChunkRule class specifies what words or patterns to include and exclude in a chunk.

Defining Chunk patterns :
Chuck patterns are normal regular expressions which are modified and designed to match the part-of-speech tag designed to match sequences of part-of-speech tags. Angle brackets are used to specify an indiviual tag for example – to match a noun tag. One can define multiple tags in the same way.

Code #1 : Converting chunks to RegEx Pattern.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Laading Library
from nltk.chunk.regexp import tag_pattern2re_pattern
  
# Chunk Pattern to RegEx Pattern 
print("Chunk Pattern : ", tag_pattern2re_pattern('<DT>?<NN.*>+'))

chevron_right


Output :

Chunk Pattern :  ()?(<(NN[^\{\}]*)>)+

Curly Braces are used to specify a chunk like {} and to specify the chink pattern one can just flip the braces }{. For a particular phrase type, these rules (chunk and a chink pattern) can be combined into a grammer.



Code #2 : Parsing the sentence with RegExParser.

filter_none

edit
close

play_arrow

link
brightness_4
code

from nltk.chunk import RegexpParser
  
# Introducing the Pattern 
chunker = RegexpParser(r'''
NP:
{<DT><NN.*><.*>*<NN.*>} 
}<VB.*>{
''')
  
chunker.parse([('the', 'DT'), ('book', 'NN'), (
    'has', 'VBZ'), ('many', 'JJ'), ('chapters', 'NNS')])

chevron_right


Output :

Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), 
Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.