Prerequisite: Introduction to Stemming
Stemming is the process of producing morphological variants of a root/base word. Stemming programs are commonly referred to as stemming algorithms or stemmers. A stemming algorithm reduces the words “chocolates”, “chocolatey”, “choco” to the root word, “chocolate” and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”.
Some more example of stemming for root word "like" include: -> "likes" -> "liked" -> "likely" -> "liking"
Errors in Stemming:
There are mainly two errors in stemming – Overstemming and Understemming. Overstemming occurs when two words are stemmed to same root that are of different stems. Under-stemming occurs when two words are stemmed to same root that are not of different stems.
Applications of stemming are:
- Stemming is used in information retrieval systems like search engines.
- It is used to determine domain vocabularies in domain analysis.
Stemming is desirable as it may reduce redundancy as most of the time the word stem and their inflected/derived words mean the same.
Below is the implementation of stemming words using NLTK:
program : program programs : program programer : program programing : program programers : program
Code #2: Stemming words from sentences
Programers : program program : program with : with programing : program languages : languag
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- Python NLTK | nltk.tokenize.TabTokenizer()
- Python NLTK | nltk.tokenize.SpaceTokenizer()
- Python NLTK | nltk.tokenize.StanfordTokenizer()
- Python NLTK | nltk.tokenizer.word_tokenize()
- Python NLTK | nltk.TweetTokenizer()
- Python NLTK | nltk.tokenize.mwe()
- Python NLTK | nltk.WhitespaceTokenizer
- Python NLTK | nltk.tokenize.LineTokenizer
- Python NLTK | nltk.tokenize.SExprTokenizer()
- Python | NLTK nltk.tokenize.ConditionalFreqDist()
- Introduction to Stemming
- Part of Speech Tagging with Stop words using NLTK in python
- Removing stop words with NLTK in Python
- Tokenize text using NLTK in python
- How to get synonyms/antonyms from NLTK WordNet in Python?
- Python | Lemmatization with NLTK
- Python | Gender Identification by name using NLTK
- Python NLTK | tokenize.regexp()
- Python NLTK | tokenize.WordPunctTokenizer()
- Creating a Basic hardcoded ChatBot using Python-NLTK
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.