Skip to content
Related Articles
Open in App
Not now

Related Articles

NLP | Part of speech tagged – word corpus

Improve Article
Save Article
  • Last Updated : 11 Apr, 2022
Improve Article
Save Article

What is Part-of-speech (POS) tagging ? 
It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on. 

Example of Part-of-speech (POS) tagged corpus 

The/at-tl expense/nn and/cc time/nn involved/vbn are/ber astronomical/jj ./.

format for a tagged corpus is of the form word/tag. Each word is with a tag denoting its POS. For example, nn refers to a noun, vb is a verb. 

Code #1 : Creating a TaggedCorpusReader. for words 


# Using TaggedCorpusReader
from nltk.corpus.reader import TaggedCorpusReader
# initializing
x = TaggedCorpusReader('.', r'.*\.pos')
words = x.words()
print ("Words : \n", words)
tag_words = x.tagged_words()
print ("\ntag_words : \n", tag_words)

Output : 

Words : 
['The', 'expense', 'and', 'time', 'involved', 'are', ...]

tag_words : 
[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ...]

Code #2 : For sentence  


tagged_sent = x.tagged_sents()
print ("tagged_sent : \n", tagged_sent)

Output : 

tagged_sent : 
[[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ('time', 'NN'),
('involved', 'VBN'), ('are', 'BER'), ('astronomical', 'JJ'), ('.', '.')]]

Code #3 : For paragraphs  


para = x.para()
print ("para : \n", para)
tagged_para = x.tagged_paras()
print ("\ntagged_paras : \n", tagged_paras)

Output : 

[[['The', 'expense', 'and', 'time', 'involved', 'are', 'astronomical', '.']]]

tagged_paras : 
[[[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ('time', 'NN'),
('involved', 'VBN'), ('are', 'BER'), ('astronomical', 'JJ'), ('.', '.')]]] 


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!