Problem Statement – Given any input word and text file, predict the next n words that can occur after the input word in the text file.
Input : is Output : is it simply makes sure that there are never Input : is Output : is split, all the maximum amount of objects, it Input : the Output : the exact same position. There will be some.
Note – For illustarting the example, I have assigned the variable corpus to some text. If you want to test data on real world text data, you can find the data here.
Solution – We can approach this problem using the concepts of probability. Firstly we must calculate the frequency of all the words occurring just after the input in the text file(n-grams, here it is 1-gram, because we always find the next 1 word in the whole data file). Then using those frequencies, calculate the CDF of all these words and just choose a random word from it. To choose this random word, we take a random number and find the smallest CDF greater than or equal the random number. We do so because we want the most probable answer for each case. So that can be achieved by cdf as it gives the cumulative probability for each word in the list.
After finding the CDF, we can easily find the corresponding word and append that word to the output string. Now, if you wish, you can also append the word to the input string and send the whole string to repeat the process to find the next word, or you can just send the word that you found out using cdf. I have done that using the former approach.
Note – You will get a different output if you enter the same word multiple times. That depends on the size of your data file. Larger the file, more probability of a different output.
Code for above algorithm
The concept shown above is used in fields like Natural Langauage Processing. This is a naive approach just to illustrate the concept. Actually, there are much more algorithms out there for word prediction. You can find one of them here
- Python program to read file word by word
- Heart Disease Prediction using ANN
- ML | Rainfall prediction using Linear regression
- Prediction using ColumnTransformer, OneHotEncoder and Pipeline
- COVID-19 Peak Prediction using Logistic Function
- Prediction of Wine type using Deep Learning
- ML | Heart Disease Prediction Using Logistic Regression .
- Python | Customer Churn Analysis Prediction
- Scrapping Weather prediction Data using Python and BS4
- Link Prediction - Predict edges in a network using Networkx
- Multi-Label Image Classification - Prediction of image labels
- NLP | Likely Word Tags
- NLP | Word Collocations
- Python | Word Stretch
- Python - Get Nth word in given String
- NLP | Synsets for a word in WordNet
- ML | Word Encryption using Keras
- Count occurrences of a word in string
- Second most repeated word in a sequence in Python
- Python - Kth word replace in String
Improved By : shubham_singh