NLP | Chunking and chinking with RegEx

Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence (tagged with Part-of-Speech).
Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can’t be a part of chuck and such words are known as chinks. A ChunkRule class specifies what words or patterns to include and exclude in a chunk.

Defining Chunk patterns :
Chuck patterns are normal regular expressions which are modified and designed to match the part-of-speech tag designed to match sequences of part-of-speech tags. Angle brackets are used to specify an indiviual tag for example – to match a noun tag. One can define multiple tags in the same way.

Code #1 : Converting chunks to RegEx Pattern.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Laading Library
from nltk.chunk.regexp import tag_pattern2re_pattern
  
# Chunk Pattern to RegEx Pattern 
print("Chunk Pattern : ", tag_pattern2re_pattern('<DT>?<NN.*>+'))

chevron_right


Output :

Chunk Pattern :  ()?(<(NN[^\{\}]*)>)+

Curly Braces are used to specify a chunk like {} and to specify the chink pattern one can just flip the braces }{. For a particular phrase type, these rules (chunk and a chink pattern) can be combined into a grammer.

Code #2 : Parsing the sentence with RegExParser.

filter_none

edit
close

play_arrow

link
brightness_4
code

from nltk.chunk import RegexpParser
  
# Introducing the Pattern 
chunker = RegexpParser(r'''
NP:
{<DT><NN.*><.*>*<NN.*>} 
}<VB.*>{
''')
  
chunker.parse([('the', 'DT'), ('book', 'NN'), (
    'has', 'VBZ'), ('many', 'JJ'), ('chapters', 'NNS')])

chevron_right


Output :

Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), 
Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])


My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.