Open In App

NLP | Chunking using Corpus Reader

What are Chunks? 
These are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can’t be a part of chuck and such words are known as chinks. A ChunkRule class specifies what words or patterns to include and exclude in a chunk.
How it works : 
 

Diagram listing the major methods: 
 



Code #1 : Creating a ChunkedCorpusReader for words 






# Using ChunkedCorpusReader
from nltk.corpus.reader import ChunkedCorpusReader
 
# initializing
x = ChunkedCorpusReader('.', r'.*\.chunk')
 
words = x.chunked_words()
print ("Words : \n", words)

Output : 

Words : 
[Tree('NP', [('Earlier', 'JJR'), ('staff-reduction', 'NN'), 
('moves', 'NNS')]), ('have', 'VBP'), ...]

Code #2 : For sentence 




Chunked Sentence = x.chunked_sents()
print ("Chunked Sentence : \n", tagged_sent)

Output : 

Chunked Sentence : 
[Tree('S', [Tree('NP', [('Earlier', 'JJR'), ('staff-reduction', 'NN'), 
('moves', 'NNS')]), ('have', 'VBP'), ('trimmed', 'VBN'), ('about', 'IN'), 
Tree('NP', [('300', 'CD'), ('jobs', 'NNS')]), (', ', ', '),
Tree('NP', [('the', 'DT'), ('spokesman', 'NN')]), ('said', 'VBD'), ('.', '.')])]

Code #3 : For paragraphs 




para = x.chunked_paras()()
print ("para : \n", para)

Output : 

[[Tree('S', [Tree('NP', [('Earlier', 'JJR'), ('staff-reduction',
'NN'), ('moves', 'NNS')]), ('have', 'VBP'), ('trimmed', 'VBN'),
('about', 'IN'), 
Tree('NP', [('300', 'CD'), ('jobs', 'NNS')]), (', ', ', '), 
Tree('NP', [('the', 'DT'), ('spokesman', 'NN')]), ('said', 'VBD'), ('.', '.')])]] 

Article Tags :