Problem Statement – Given any input word and text file, predict the next n words that can occur after the input word in the text file.
Input : is Output : is it simply makes sure that there are never Input : is Output : is split, all the maximum amount of objects, it Input : the Output : the exact same position. There will be some.
Note – For illustarting the example, I have assigned the variable corpus to some text. If you want to test data on real world text data, you can find the data here.
Solution – We can approach this problem using the concepts of probability. Firstly we must calculate the frequency of all the words occurring just after the input in the text file(n-grams, here it is 1-gram, because we always find the next 1 word in the whole data file). Then using those frequencies, calculate the CDF of all these words and just choose a random word from it. To choose this random word, we take a random number and find the smallest CDF greater than or equal the random number. We do so because we want the most probable answer for each case. So that can be achieved by cdf as it gives the cumulative probability for each word in the list.
After finding the CDF, we can easily find the corresponding word and append that word to the output string. Now, if you wish, you can also append the word to the input string and send the whole string to repeat the process to find the next word, or you can just send the word that you found out using cdf. I have done that using the former approach.
Note – You will get a different output if you enter the same word multiple times. That depends on the size of your data file. Larger the file, more probability of a different output.
Code for above algorithm
The concept shown above is used in fields like Natural Langauage Processing. This is a naive approach just to illustrate the concept. Actually, there are much more algorithms out there for word prediction. You can find one of them here
- ML | Rainfall prediction using Linear regression
- Prediction of Wine type using Deep Learning
- NLP | Word Collocations
- NLP | Likely Word Tags
- Python | Word Stretch
- ML | Word Encryption using Keras
- Python - Get Nth word in given String
- NLP | Synsets for a word in WordNet
- Generating Word Cloud in Python
- Python | Word Embedding using Word2Vec
- Generating Word Cloud in Python | Set 2
- Second most repeated word in a sequence in Python
- Python | Word Similarity using spaCy
- Python | Reverse each word in a sentence
- Count occurrences of a word in string
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : shubham_singh