Open In App

NLP | Expanding and Removing Chunks with RegEx

Last Updated : 29 Jan, 2019
Improve
Improve
Like Article
Like
Save
Share
Report

RegexpParser or RegexpChunkRule.fromstring() doesn’t support all the RegexpChunkRule classes. So, we need to create them manually.

This article focusses on 3 of such classes :

ExpandRightRule: It adds chink (unchunked) words to the right of a chunk.
ExpandLeftRule: It adds chink (unchunked) words to the left of a chunk.
For ExpandLeftRule and ExpandRightRule takes as parameter – the right and left chink pattern respectively that we want to add to the beginning and ending of the chunk respectively.

UnChunkRule: It unchunks any matching chunk and it becomes a chink.
 

Code #1: How the code works




# Loading Libraries
from nltk.chunk.regexp import ChunkRule, ExpandLeftRule
from nltk.chunk.regexp import ExpandRightRule, UnChunkRule
from nltk.chunk import RegexpChunkParser
  
# Initialising ChunkRule
ur = ChunkRule('<NN>', 'single noun')
  
# Initialising ExpandLeftRule
el = ExpandLeftRule('<DT>', '<NN>', 'get left determiner')
  
# Initialising ExpandRightRule
er = ExpandRightRule('<NN>', '<NNS>', 'get right plural noun')
  
# Initialising UnChunkRule
un = UnChunkRule('<DT><NN.*>*', 'unchunk everything')
  
chunker = RegexpChunkParser([ur, el, er, un])
  
sent = [('the', 'DT'), ('sushi', 'NN'), ('rolls', 'NNS')]
  
chunker.parse(sent)


Output:

Tree('S', [('the', 'DT'), ('sushi', 'NN'), ('rolls', 'NNS')])

Note: Output is a flat sentence as UnChunkRule undid the chunk created by the previous rules.

How the stuff works?

  • Make a chunk with noun.
  • Expanding the left determiners to chunks that begin with noun.
  • Expanding the right plural nouns to chunks that ends with noun.
  • Finally, it unchunk every chunk that is a determiner + noun + plural noun, resulting in the original sentence tree.

Code #2: Step by Step Code Explaining the diagram.




# Loading Libraries
from nltk.chunk.regexp import ChunkRule, ExpandLeftRule
from nltk.chunk.regexp import ExpandRightRule, UnChunkRule
from nltk.chunk import RegexpChunkParser
from nltk.chunk.regexp import ChunkString
from nltk.tree import Tree
  
chunk_string = ChunkString(Tree('S', sent))
print ("Chunk String : ", chunk_string)
  
# Initialising ChunkRule
ur = ChunkRule('<NN>', 'single noun')
ur.apply(chunk_string)
print ("\nstep 1 : ", chunk_string)
  
# Initialising ExpandLeftRule
el = ExpandLeftRule('<DT>', '<NN>', 'get left determiner')
el.apply(chunk_string)
print ("step 2 : ", chunk_string)
  
# Initialising ExpandRightRule
er = ExpandRightRule('<NN>', '<NNS>', 'get right plural noun')
er.apply(chunk_string)
print ("step 3 : ", chunk_string)
  
# Initialising UnChunkRule
un = UnChunkRule('<DT><NN.*>*', 'unchunk everything')
un.apply(chunk_string)
print ("step 4 : ", chunk_string)


Output :

Chunk String :   <DT>  <NN>  <NNS> 

step 1 :   <DT> {<NN>} <NNS> 
step 2 :  {<DT>  <NN>} <NNS> 
step 3 :  {<DT>  <NN>  <NNS>}
step 4 :   <DT>  <NN>  <NNS>


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads