Skip to content
Related Articles

Related Articles

NLP | Part of speech tagged – word corpus
  • Last Updated : 20 Feb, 2019

Whats is Part-of-speech (POS) tagging ?
It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on.

Example of Part-of-speech (POS) tagged corpus

The/at-tl expense/nn and/cc time/nn involved/vbn are/ber astronomical/jj ./.

format for a tagged corpus is of the form word/tag. Each word is with a tag denoting its POS. For example, nn refers to a noun, vb is a verb.

Code #1 : Creating a TaggedCorpusReader. for words

# Using TaggedCorpusReader
from nltk.corpus.reader import TaggedCorpusReader
# intitializing
x = TaggedCorpusReader('.', r'.*\.pos')
words = x.words()
print ("Words : \n", words)
tag_words = x.tagged_words()
print ("\ntag_words : \n", tag_words)

Output :

Words : 
['The', 'expense', 'and', 'time', 'involved', 'are', ...]

tag_words : 
[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ...]

Code #2 : For sentence

tagged_sent = x.tagged_sents()
print ("tagged_sent : \n", tagged_sent)

Output :

tagged_sent : 
[[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ('time', 'NN'),
('involved', 'VBN'), ('are', 'BER'), ('astronomical', 'JJ'), ('.', '.')]]

Code #3 : For paragraphs

para = x.para()
print ("para : \n", para)
tagged_para = x.tagged_paras()
print ("\ntagged_paras : \n", tagged_paras)

Output :

[[['The', 'expense', 'and', 'time', 'involved', 'are', 'astronomical', '.']]]

tagged_paras : 
[[[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ('time', 'NN'),
('involved', 'VBN'), ('are', 'BER'), ('astronomical', 'JJ'), ('.', '.')]]] 


My Personal Notes arrow_drop_up
Recommended Articles
Page :