Lemmatization is the process of grouping together the different inflected forms of a word so they can be analysed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meaning to one word.
Text preprocessing includes both Stemming as well as Lemmatization. Many times people find these two terms confusing. Some treat these two as same. Actually, lemmatization is preferred over Stemming because lemmatization does morphological analysis of the words.
Applications of lemmatization are:
- Used in comprehensive retrieval systems like search engines.
- Used in compact indexing
Examples of lemmatization: -> rocks : rock -> corpora : corpus -> better : good
One major difference with stemming is that lemmatize takes a part of speech parameter, “pos” If not supplied, the default is “noun.”
Below is the implementation of lemmatization words using NLTK:
rocks : rock corpora : corpus better : good
- Python NLTK | nltk.tokenize.LineTokenizer
- Python | NLTK nltk.tokenize.ConditionalFreqDist()
- Python NLTK | nltk.tokenize.SpaceTokenizer()
- Python NLTK | nltk.tokenizer.word_tokenize()
- Python NLTK | nltk.tokenize.TabTokenizer()
- Python NLTK | nltk.tokenize.StanfordTokenizer()
- Python NLTK | nltk.tokenize.SExprTokenizer()
- Python NLTK | nltk.tokenize.mwe()
- Python NLTK | nltk.WhitespaceTokenizer
- Python NLTK | nltk.TweetTokenizer()
- Python | Lemmatization with TextBlob
- Python | PoS Tagging and Lemmatization using spaCy
- Python | Gender Identification by name using NLTK
- Python NLTK | tokenize.regexp()
- Python | Stemming words with NLTK
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.