NLP | Splitting and Merging Chunks

Last Updated : 28 Jan, 2019

SplitRule class : It splits a chunk based on the specified split pattern for the purpose. It is specified like <NN.*>}{<.*> i.e. two opposing curly braces surrounded by a pattern on either side.

MergeRule class : It merges two chunks together based on the ending of the first chunk and the beginning of the second chunk. It is specified like <NN.*>{}<.*> i.e. curly braces facing each other.

Example of how the steps are performed

Starting with the sentence tree.

Chunking complete sentence.

Chunks are split into multiple chunks.

Chunk with a determiner is split into separate chunks.

Chunks ending with a noun are merged with the next chunk.

Code #1 – Constructing Tree

from nltk.chunk import RegexpParser 
chunker = RegexpParser(r''' 
NP: 
{<DT><.*>*<NN.*>} 
<NN.*>}{<.*> 
<.*>}{<DT> 
<NN.*>{}<NN.*> 
''') 
sent = [('the', 'DT'), ('sushi', 'NN'), ('roll', 'NN'), ('was', 'VBD'),  
        ('filled', 'VBN'), ('with', 'IN'), ('the', 'DT'), ('fish', 'NN')] 
chunker.parse(sent) 

Output:

Tree('S', [Tree('NP', [('the', 'DT'), ('sushi', 'NN'), ('roll', 'NN')]), 
Tree('NP', [('was', 'VBD'), ('filled', 'VBN'), ('with', 'IN')]), 
Tree('NP', [('the', 'DT'), ('fish', 'NN')])])

Code #2 – Splitting and Merging

# Loading Libraries 
from nltk.chunk.regexp import ChunkString, ChunkRule, ChinkRule 
from nltk.tree import Tree 
from nltk.chunk.regexp import MergeRule, SplitRule 
  
# Chunk String 
chunk_string = ChunkString(Tree('S', sent)) 
print ("Chunk String : ", chunk_string) 
  
# Applying Chunk Rule 
ur = ChunkRule('<DT><.*>*<NN.*>', 'chunk determiner to noun') 
ur.apply(chunk_string) 
print ("\nApplied ChunkRule : ", chunk_string) 
  
# Splitting 
sr1 = SplitRule('<NN.*>', '<.*>', 'split after noun') 
sr1.apply(chunk_string) 
print ("\nSplitting Chunk String : ", chunk_string) 
  
  
sr2 = SplitRule('<.*>', '<DT>', 'split before determiner') 
sr2.apply(chunk_string) 
print ("\nFurther Splitting Chunk String : ", chunk_string) 
  
# Merging 
mr = MergeRule('<NN.*>', '<NN.*>', 'merge nouns') 
mr.apply(chunk_string) 
print ("\nMerging Chunk String : ", chunk_string) 
  
# Back to Tree 
chunk_string.to_chunkstruct() 

Output:

Chunk String :   <DT>  <NN>  <NN>  <VBD>  <VBN>  <IN>  <DT>  <NN> 

Applied ChunkRule :  {<DT>  <NN>  <NN>  <VBD>  <VBN>  <IN>  <DT>  <NN>}

Splitting Chunk String :  {<DT>  <NN>}{<NN>}{<VBD>  <VBN>  <IN>  <DT>  <NN>}

Further Splitting Chunk String :  {<DT>  <NN>}{<NN>}{<VBD>  <VBN>  <IN>}{<DT>  <NN>}

Merging Chunk String :  {<DT>  <NN>  <NN>}{<VBD>  <VBN>  <IN>}{<DT>  <NN>}

Tree('S', [Tree('CHUNK', [('the', 'DT'), ('sushi', 'NN'), ('roll', 'NN')]), 
          Tree('CHUNK', [('was', 'VBD'), ('filled', 'VBN'), ('with', 'IN')]), 
          Tree('CHUNK', [('the', 'DT'), ('fish', 'NN')])])

Suggest improvement

NLP | Expanding and Removing Chunks with RegEx

Share your thoughts in the comments

NLP | Splitting and Merging Chunks

Example of how the steps are performed

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?