Related Articles

Related Articles

Python | Lemmatization with NLTK
  • Last Updated : 06 Nov, 2018

Lemmatization is the process of grouping together the different inflected forms of a word so they can be analysed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meaning to one word.

Text preprocessing includes both Stemming as well as Lemmatization. Many times people find these two terms confusing. Some treat these two as same. Actually, lemmatization is preferred over Stemming because lemmatization does morphological analysis of the words.

Applications of lemmatization are:

  • Used in comprehensive retrieval systems like search engines.
  • Used in compact indexing
Examples of lemmatization:

-> rocks : rock
-> corpora : corpus
-> better : good

One major difference with stemming is that lemmatize takes a part of speech parameter, “pos” If not supplied, the default is “noun.”

Below is the implementation of lemmatization words using NLTK:

filter_none

edit
close

play_arrow

link
brightness_4
code

# import these modules
from nltk.stem import WordNetLemmatizer
  
lemmatizer = WordNetLemmatizer()
  
print("rocks :", lemmatizer.lemmatize("rocks"))
print("corpora :", lemmatizer.lemmatize("corpora"))
  
# a denotes adjective in "pos"
print("better :", lemmatizer.lemmatize("better", pos ="a"))

chevron_right


Output :

rocks : rock
corpora : corpus
better : good

machine-learning

My Personal Notes arrow_drop_up
Recommended Articles
Page :