TextBlob module is used for building programs for text analysis. One of the more powerful aspects of the TextBlob module is the Part of Speech tagging.
Install TextBlob run the following commands:
$ pip install -U textblob $ python -m textblob.download_corpora
This will install TextBlob and download the necessary NLTK corpora.
The above installation will take quite some time due to the massive amount of tokenizers, chunkers, other algorithms, and all of the corpora to be downloaded.
Let’s knock out some quick vocabulary:
Corpus : Body of text, singular. Corpora is the plural of this.
Lexicon : Words and their meanings.
Token : Each “entity” that is a part of whatever was split up based on rules.
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called Grammatical tagging or Word-category disambiguation.
Input: Everything is all about money. Output: [('Everything', 'NN'), ('is', 'VBZ'), ('all', 'DT'), ('about', 'IN'), ('money', 'NN')]
Here’s a list of the tags, what they mean, and some examples:
CC coordinating conjunction CD cardinal digit DT determiner EX existential there (like: “there is” … think of it like “there exists”) FW foreign word IN preposition/subordinating conjunction JJ adjective ‘big’ JJR adjective, comparative ‘bigger’ JJS adjective, superlative ‘biggest’ LS list marker 1) MD modal could, will NN noun, singular ‘desk’ NNS noun plural ‘desks’ NNP proper noun, singular ‘Harrison’ NNPS proper noun, plural ‘Americans’ PDT predeterminer ‘all the kids’ POS possessive ending parent‘s PRP personal pronoun I, he, she PRP$ possessive pronoun my, his, hers RB adverb very, silently, RBR adverb, comparative better RBS adverb, superlative best RP particle give up TO to go ‘to‘ the store. UH interjection errrrrrrrm VB verb, base form take VBD verb, past tense took VBG verb, gerund/present participle taking VBN verb, past participle taken VBP verb, sing. present, non-3d take VBZ verb, 3rd person sing. present takes WDT wh-determiner which WP wh-pronoun who, what WP$ possessive wh-pronoun whose WRB wh-abverb where, when
[('Sukanya', 'NNP'), ('Rajib', 'NNP'), ('and', 'CC'), ('Naba', 'NNP'), ('are', 'VBP'), ('my', 'PRP$'), ('good', 'JJ'), ('friends', 'NNS'), ('Sukanya', 'NNP'), ('is', 'VBZ'), ('getting', 'VBG'), ('married', 'VBN'), ('next', 'JJ'), ('year', 'NN'), ('Marriage', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('big', 'JJ'), ('step', 'NN'), ('in', 'IN'), ('one', 'CD'), ('’', 'NN'), ('s', 'NN'), ('life.It', 'NN'), ('is', 'VBZ'), ('both', 'DT'), ('exciting', 'VBG'), ('and', 'CC'), ('frightening', 'NN'), ('But', 'CC'), ('friendship', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('sacred', 'JJ'), ('bond', 'NN'), ('between', 'IN'), ('people.It', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('special', 'JJ'), ('kind', 'NN'), ('of', 'IN'), ('love', 'NN'), ('between', 'IN'), ('us', 'PRP'), ('Many', 'JJ'), ('of', 'IN'), ('you', 'PRP'), ('must', 'MD'), ('have', 'VB'), ('tried', 'VBN'), ('searching', 'VBG'), ('for', 'IN'), ('a', 'DT'), ('friend', 'NN'), ('but', 'CC'), ('never', 'RB'), ('found', 'VBD'), ('the', 'DT'), ('right', 'JJ'), ('one', 'NN')]
Basically, the goal of a POS tagger is to assign linguistic (mostly grammatical) information to sub-sentential units. Such units are called tokens and, most of the time, correspond to words and symbols (e.g. punctuation).
- Part of Speech Tagging with Stop words using NLTK in python
- NLP | Part of Speech - Default Tagging
- NLP | Part of speech tagged - word corpus
- Speech Recognition in Python using Google Speech API
- Python | Lemmatization with TextBlob
- Python | Tokenize text using TextBlob
- Python | PoS Tagging and Lemmatization using spaCy
- Convert Text to Speech in Python
- Python Text To Speech | pyttsx module
- Convert Text to Speech in Python using win32com.client
- Python | Sort given list of strings by part of string
- NLP | WordNet for tagging
- NLP | Regex and Affix tagging
- NLP | Trigrams'n'Tags (TnT) Tagging
- NLP | Classifier-based tagging
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.