Open In App

NLP | Part of speech tagged – word corpus

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

What is Part-of-speech (POS) tagging ? 
It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on. 

Example of Part-of-speech (POS) tagged corpus 

The/at-tl expense/nn and/cc time/nn involved/vbn are/ber astronomical/jj ./.

format for a tagged corpus is of the form word/tag. Each word is with a tag denoting its POS. For example, nn refers to a noun, vb is a verb. 

Code #1 : Creating a TaggedCorpusReader. for words 

Python3




# Using TaggedCorpusReader
from nltk.corpus.reader import TaggedCorpusReader
   
# initializing
x = TaggedCorpusReader('.', r'.*\.pos')
   
words = x.words()
print ("Words : \n", words)
   
tag_words = x.tagged_words()
print ("\ntag_words : \n", tag_words)


Output : 

Words : 
['The', 'expense', 'and', 'time', 'involved', 'are', ...]

tag_words : 
[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ...]

Code #2 : For sentence  

Python3




tagged_sent = x.tagged_sents()
print ("tagged_sent : \n", tagged_sent)


Output : 

tagged_sent : 
[[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ('time', 'NN'),
('involved', 'VBN'), ('are', 'BER'), ('astronomical', 'JJ'), ('.', '.')]]

Code #3 : For paragraphs  

Python3




para = x.para()
print ("para : \n", para)
   
tagged_para = x.tagged_paras()
print ("\ntagged_paras : \n", tagged_paras)


Output : 

para: 
[[['The', 'expense', 'and', 'time', 'involved', 'are', 'astronomical', '.']]]

tagged_paras : 
[[[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ('time', 'NN'),
('involved', 'VBN'), ('are', 'BER'), ('astronomical', 'JJ'), ('.', '.')]]] 

 



Last Updated : 11 Apr, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads