Skip to content
Related Articles

Related Articles

Python | Stemming words with NLTK

Improve Article
Save Article
Like Article
  • Difficulty Level : Basic
  • Last Updated : 18 May, 2022

Prerequisite: Introduction to Stemming
Stemming is the process of producing morphological variants of a root/base word. Stemming programs are commonly referred to as stemming algorithms or stemmers. A stemming algorithm reduces the words “chocolates”, “chocolatey”, “choco” to the root word, “chocolate” and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”.

Machine-Learning-Course

Some more example of stemming for root word "like" include:

-> "likes"
-> "liked"
-> "likely"
-> "liking"

Errors in Stemming: 
There are mainly two errors in stemming – Overstemming and Understemming. Overstemming occurs when two words are stemmed to same root that are of different stems. Under-stemming occurs when two words are stemmed to same root that are not of different stems.

Applications of stemming are:  

  • Stemming is used in information retrieval systems like search engines.
  • It is used to determine domain vocabularies in domain analysis.

Stemming is desirable as it may reduce redundancy as most of the time the word stem and their inflected/derived words mean the same.

Below is the implementation of stemming words using NLTK:

Code #1:  

Python3




# import these modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
   
ps = PorterStemmer()
  
# choose some words to be stemmed
words = ["program", "programs", "programmer", "programming", "programmers"]
  
for w in words:
    print(w, " : ", ps.stem(w))

Output: 

program  :  program
programs  :  program
programmer  :  program
programming  :  program
programmers  :  program

  Code #2: Stemming words from sentences

Python3




# importing modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
   
ps = PorterStemmer()
   
sentence = "Programmers program with programming languages"
words = word_tokenize(sentence)
   
for w in words:
    print(w, " : ", ps.stem(w))

Output : 

Programmers  :  program
program  :  program
with  :  with
programming  :  program
languages  :  languag

 


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!