Whats is Part-of-speech (POS) tagging ?
It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on.
Default tagging is a basic step for the part-of-speech tagging. It is performed using the DefaultTagger class. The
DefaultTagger class takes ‘tag’ as a single argument. NN is the tag for a singular noun.
DefaultTagger is most useful when it gets to work with most common part-of-speech tag. that’s why a noun tag is recommended.
Code #1 : How it works ?
[('Hello', 'NN'), ('Geeks', 'NN')]
Each tagger has a
tag() method that takes a list of tokens (usually list of words produced by a word tokenizer), where each token is a single word.
tag() returns a list of tagged tokens – a tuple of (word, tag).
How DefaultTagger works ?
It is a subclass of
SequentialBackoffTagger and implements the
choose_tag() method, having three arguments.
- list of tokens
- index of the current token, to choose the tag.
- list of the previous tags
Code #2 : Tagging Sentences
[[('welcome', 'NN'), ('to', 'NN'), ('.', 'NN')], [('Geeks', 'NN'), ('for', 'NN'), ('Geeks', 'NN')]]
Note: Every tag in the list of tagged sentences (in the above code) is NN as we have used
Code #3 : Illustrating how to untag.
['Geeks', 'for', 'Geeks']