Python | Stemming words with NLTK

Prerequisite: Introduction to Stemming

Stemming is the process of producing morphological variants of a root/base word. Stemming programs are commonly referred to as stemming algorithms or stemmers. A stemming algorithm reduces the words “chocolates”, “chocolatey”, “choco” to the root word, “chocolate” and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”.

Some more example of stemming for root word "like" include:

-> "likes"
-> "liked"
-> "likely"
-> "liking"

Errors in Stemming:
There are mainly two errors in stemming – Overstemming and Understemming. Overstemming occurs when two words are stemmed to same root that are of different stems. Under-stemming occurs when two words are stemmed to same root that are not of different stems.

Applications of stemming are:

  • Stemming is used in information retrieval systems like search engines.
  • It is used to determine domain vocabularies in domain analysis.

Stemming is desirable as it may reduce redundancy as most of the time the word stem and their inflected/derived words mean the same.

Below is the implementation of stemming words using NLTK:

Code #1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# import these modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
   
ps = PorterStemmer()
  
# choose some words to be stemmed
words = ["program", "programs", "programer", "programing", "programers"]
  
for w in words:
    print(w, " : ", ps.stem(w))

chevron_right


Output:

program  :  program
programs  :  program
programer  :  program
programing  :  program
programers  :  program

 
Code #2: Stemming words from sentences

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
   
ps = PorterStemmer()
   
sentence = "Programers program with programing languages"
words = word_tokenize(sentence)
   
for w in words:
    print(w, " : ", ps.stem(w))

chevron_right


Output :

Programers  :  program
program  :  program
with  :  with
programing  :  program
languages  :  languag


My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.