Lemmatization is the process of grouping together the different inflected forms of a word so they can be analysed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meaning to one word.
Text preprocessing includes both Stemming as well as Lemmatization. Many times people find these two terms confusing. Some treat these two as same. Actually, lemmatization is preferred over Stemming because lemmatization does morphological analysis of the words.
Applications of lemmatization are:
- Used in comprehensive retrieval systems like search engines.
- Used in compact indexing.
Examples of lemmatization : -> rocks : rock -> corpora : corpus -> better : good
One major difference with stemming is that lemmatize takes a part of speech parameter, “pos” If not supplied, the default is “noun.”
Below is the implementation of lemmatization words using TextBlob:
rocks : rock corpora : corpus better : good
- Python | Lemmatization with NLTK
- Python | PoS Tagging and Lemmatization using spaCy
- Python | TextBlob.sentiment() method
- Python | TextBlob.word_counts() method
- Python | TextBlob.noun_phrases() method
- Python | Tokenize text using TextBlob
- Python | TextBlob.correct() method
- Python | Part of Speech Tagging using TextBlob
- Python | TextBlob.Word.spellcheck() method
- Reusable piece of python functionality for wrapping arbitrary blocks of code : Python Context Managers
- Python - Read blob object in python using wand library
- Python | Index of Non-Zero elements in Python list
- MySQL-Connector-Python module in Python
- Python | PRAW - Python Reddit API Wrapper
- Reading Python File-Like Objects from C | Python
- Important differences between Python 2.x and Python 3.x with examples
- twitter-text-python (ttp) module - Python
- Python | Merge Python key values to list
- Python | Convert list to Python array
- Python | Sort Python Dictionaries by Key or Value