Open In App

NLP | Expanding and Removing Chunks with RegEx

Last Updated : 29 Jan, 2019
Improve
Improve
Like Article
Like
Save
Share
Report

RegexpParser or RegexpChunkRule.fromstring() doesn’t support all the RegexpChunkRule classes. So, we need to create them manually.

This article focusses on 3 of such classes :

ExpandRightRule: It adds chink (unchunked) words to the right of a chunk.
ExpandLeftRule: It adds chink (unchunked) words to the left of a chunk.
For ExpandLeftRule and ExpandRightRule takes as parameter – the right and left chink pattern respectively that we want to add to the beginning and ending of the chunk respectively.

UnChunkRule: It unchunks any matching chunk and it becomes a chink.
 

Code #1: How the code works




# Loading Libraries
from nltk.chunk.regexp import ChunkRule, ExpandLeftRule
from nltk.chunk.regexp import ExpandRightRule, UnChunkRule
from nltk.chunk import RegexpChunkParser
  
# Initialising ChunkRule
ur = ChunkRule('<NN>', 'single noun')
  
# Initialising ExpandLeftRule
el = ExpandLeftRule('<DT>', '<NN>', 'get left determiner')
  
# Initialising ExpandRightRule
er = ExpandRightRule('<NN>', '<NNS>', 'get right plural noun')
  
# Initialising UnChunkRule
un = UnChunkRule('<DT><NN.*>*', 'unchunk everything')
  
chunker = RegexpChunkParser([ur, el, er, un])
  
sent = [('the', 'DT'), ('sushi', 'NN'), ('rolls', 'NNS')]
  
chunker.parse(sent)


Output:

Tree('S', [('the', 'DT'), ('sushi', 'NN'), ('rolls', 'NNS')])

Note: Output is a flat sentence as UnChunkRule undid the chunk created by the previous rules.

How the stuff works?

  • Make a chunk with noun.
  • Expanding the left determiners to chunks that begin with noun.
  • Expanding the right plural nouns to chunks that ends with noun.
  • Finally, it unchunk every chunk that is a determiner + noun + plural noun, resulting in the original sentence tree.

Code #2: Step by Step Code Explaining the diagram.




# Loading Libraries
from nltk.chunk.regexp import ChunkRule, ExpandLeftRule
from nltk.chunk.regexp import ExpandRightRule, UnChunkRule
from nltk.chunk import RegexpChunkParser
from nltk.chunk.regexp import ChunkString
from nltk.tree import Tree
  
chunk_string = ChunkString(Tree('S', sent))
print ("Chunk String : ", chunk_string)
  
# Initialising ChunkRule
ur = ChunkRule('<NN>', 'single noun')
ur.apply(chunk_string)
print ("\nstep 1 : ", chunk_string)
  
# Initialising ExpandLeftRule
el = ExpandLeftRule('<DT>', '<NN>', 'get left determiner')
el.apply(chunk_string)
print ("step 2 : ", chunk_string)
  
# Initialising ExpandRightRule
er = ExpandRightRule('<NN>', '<NNS>', 'get right plural noun')
er.apply(chunk_string)
print ("step 3 : ", chunk_string)
  
# Initialising UnChunkRule
un = UnChunkRule('<DT><NN.*>*', 'unchunk everything')
un.apply(chunk_string)
print ("step 4 : ", chunk_string)


Output :

Chunk String :   <DT>  <NN>  <NNS> 

step 1 :   <DT> {<NN>} <NNS> 
step 2 :  {<DT>  <NN>} <NNS> 
step 3 :  {<DT>  <NN>  <NNS>}
step 4 :   <DT>  <NN>  <NNS>


Similar Reads

NLP | Splitting and Merging Chunks
SplitRule class : It splits a chunk based on the specified split pattern for the purpose. It is specified like &lt;NN.*&gt;}{&lt;.*&gt; i.e. two opposing curly braces surrounded by a pattern on either side. MergeRule class : It merges two chunks together based on the ending of the first chunk and the beginning of the second chunk. It is specified l
2 min read
NLP | Chunking and chinking with RegEx
Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence (tagged with Part-of-Speech). Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can't be a part of chuck and such words are known as chinks. A ChunkRule cla
2 min read
NLP | Regex and Affix tagging
Regular expression matching is used to tag words. Consider the example, numbers can be matched with \d to assign the tag CD (which refers to a Cardinal number). Or one can match the known word patterns, such as the suffix "ing". Understanding the concept - RegexpTagger is a subclass of SequentialBackoffTagger. It can be positioned before a DefaultT
3 min read
NLP | Partial parsing with Regex
Defining a grammar to parse 3 phrase types. ChunkRule class that looks for an optional determiner followed by one or more nouns is used for noun phrases. To add an adjective to the front of a noun chunk, MergeRule class is used. Any IN word is simply chunked for the prepositional phrases. an optional modal word (such as should) followed by a verb i
2 min read
Break a list into chunks of size N in Python
In this article, we will cover how we split a list into evenly sized chunks in Python. Below are the methods that we will cover: Using yieldUsing for loop in PythonUsing List comprehensionUsing NumpyUsing itertoolMethod 1: Break a list into chunks of size N in Python using yield keyword The yield keyword enables a function to come back where it lef
5 min read
Python | Convert String to N chunks tuple
Sometimes, while working with Python Strings, we can have a problem in which we need to break a string to N sized chunks to a tuple. Let's discuss certain ways in which this task can be performed. Method #1 : Using list comprehension + tuple This is one approach in which this task can be performed. In this, we just iterate the String and break the
2 min read
Python | Consecutive chunks Product
Some of the classical problems in the programming domain come from different categories and one among them is finding the product of subsets. This particular problem is also common when we need to compute the product and store consecutive group product values. Let’s try different approaches to this problem in Python language. Method #1 : Using list
4 min read
Python - Divide String into Equal K chunks
Given a String perform division into K equal chunks. Input : test_str = 'geeksforgeek', K = 4 Output : ['gee', 'ksf', 'org', 'eek'] Explanation : 12/4 = 3, length of each string extracted. Input : test_str = 'geeksforgeek', K = 1 Output : ['geeksforgeek'] Explanation : 12/1 = 12, whole string is single chunk. Method #1: Using len() + loop In this,
5 min read
Python - Incremental Size Chunks from Strings
Given a String, split it into incremental sizes consecutive list. Input : test_str = 'geekforgeeks is best' Output : ['g', 'ee', 'kfo', 'rgee', 'ks is', ' best'] Explanation : Characters size increasing in list. Input : test_str = 'geekforgeeks' Output : ['g', 'ee', 'kfo', 'rgee', 'ks'] Explanation : Characters size increasing in list. Method #1 :
3 min read
How to Load a Massive File as small chunks in Pandas?
Pandas in flexible and easy to use open-source data analysis tool build on top of python which makes importing and visualizing data of different formats like .csv, .tsv, .txt and even .db files. For the below examples we will be considering only .csv file but the process is similar for other file types. The method used to read CSV files is read_csv
3 min read
Practice Tags :