Skip to content
Related Articles

Related Articles

Improve Article

Python | Stemming words with NLTK

  • Last Updated : 05 Jul, 2021

Prerequisite: Introduction to Stemming
Stemming is the process of producing morphological variants of a root/base word. Stemming programs are commonly referred to as stemming algorithms or stemmers. A stemming algorithm reduces the words “chocolates”, “chocolatey”, “choco” to the root word, “chocolate” and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”.

Some more example of stemming for root word "like" include:

-> "likes"
-> "liked"
-> "likely"
-> "liking"

Errors in Stemming: 
There are mainly two errors in stemming – Overstemming and Understemming. Overstemming occurs when two words are stemmed to same root that are of different stems. Under-stemming occurs when two words are stemmed to same root that are not of different stems.

Applications of stemming are:  

  • Stemming is used in information retrieval systems like search engines.
  • It is used to determine domain vocabularies in domain analysis.

Stemming is desirable as it may reduce redundancy as most of the time the word stem and their inflected/derived words mean the same.

Below is the implementation of stemming words using NLTK:



Code #1:  

Python3




# import these modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
  
ps = PorterStemmer()
 
# choose some words to be stemmed
words = ["program", "programs", "programmer", "programming", "programmers"]
 
for w in words:
    print(w, " : ", ps.stem(w))

Output: 

program  :  program
programs  :  program
programmer  :  program
programming  :  program
programmers  :  program

  Code #2: Stemming words from sentences

Python3




# importing modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
  
ps = PorterStemmer()
  
sentence = "Programmers program with programming languages"
words = word_tokenize(sentence)
  
for w in words:
    print(w, " : ", ps.stem(w))

Output : 

Programmers  :  program
program  :  program
with  :  with
programming  :  program
languages  :  languag

 

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.




My Personal Notes arrow_drop_up
Recommended Articles
Page :