Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

NLP | Chunking and chinking with RegEx

  • Last Updated : 27 Sep, 2021

Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence (tagged with Part-of-Speech). 
Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can’t be a part of chuck and such words are known as chinks. A ChunkRule class specifies what words or patterns to include and exclude in a chunk. 

Defining Chunk patterns : 
Chuck patterns are normal regular expressions which are modified and designed to match the part-of-speech tag designed to match sequences of part-of-speech tags. Angle brackets are used to specify an individual tag for example – to match a noun tag. One can define multiple tags in the same way. 

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

Code #1 : Converting chunks to RegEx Pattern. 



Python3




# Laading Library
from nltk.chunk.regexp import tag_pattern2re_pattern
 
# Chunk Pattern to RegEx Pattern
print("Chunk Pattern : ", tag_pattern2re_pattern('<DT>?<NN.*>+'))

Output : 

Chunk Pattern :  ()?(<(NN[^\{\}]*)>)+

Curly Braces are used to specify a chunk like {} and to specify the chink pattern one can just flip the braces }{. For a particular phrase type, these rules (chunk and a chink pattern) can be combined into grammar.

Code #2 : Parsing the sentence with RegExParser.  

Python3




from nltk.chunk import RegexpParser
 
# Introducing the Pattern
chunker = RegexpParser(r'''
NP:
{<DT><NN.*><.*>*<NN.*>}
}<VB.*>{
''')
 
chunker.parse([('the', 'DT'), ('book', 'NN'), (
    'has', 'VBZ'), ('many', 'JJ'), ('chapters', 'NNS')])

Output : 

Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), 
Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])

 




My Personal Notes arrow_drop_up
Recommended Articles
Page :