Many of the words used in the phrase are insignificant and hold no meaning. For example – English is a subject. Here, ‘English’ and ‘subject’ are the most significant words and ‘is’, ‘a’ are almost useless. English subject and subject English holds the same meaning even if we remove the insignificant words – (‘is’, ‘a’). Using the nltk, we can remove the insignificant words by looking at their part-of-speech tags. For that we have to decide which Part-Of-Speech tags are significant.
Code #1 : filter_insignificant() class to filter out the insignificant words
filter_insignificant() checks whether that tag ends(for each tag) with the tag_suffixes by iterating over the tagged words in the chunk. The tagged word is skipped if tag ends with any of the
tag_suffixes. Else if the tag is ok, the tagged word is appended to a new good chunk that is returned.
Code #2 : Using
filter_insignificant() on a phrase
Significant words : [('terrible', 'JJ'), ('movie', 'NN')]
We can pass out different tag suffixes using
filter_insignificant(). In the code below we are talking about pronouns and possessive words such as your, you, their and theirs are no good, but DT and CC words are ok. The tag suffixes would then be PRP and PRP$:
Code #3 : Passing in our own tag suffixes using
Significant words : [('book', 'NN'), ('is', 'VBZ'), ('great', 'JJ')]
- Python | Bilateral Filtering
- NLP | Training a tokenizer and filtering stopwords in a sentence
- Filtering Images based on size attributes in Python
- Python | Filtering data with Pandas .query() method
- Bag of words (BoW) model in NLP
- Possible Words using given characters in Python
- Count words in a given string
- Python | Stemming words with NLTK
- Python | Extract words from given string
- Python | Number to Words using num2words
- NLP | How to score words with Execnet and Redis
- Reverse words in a given String in Python
- Python | Joining only adjacent words in list
- Python | Spilt a sentence into list of words
- NLP | How tokenizing text, sentence, words works
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.