Autocorrector Feature Using NLP In Python
Last Updated :
21 Dec, 2022
Autocorrect is a way of predicting or making the wrong spellings correct, which makes the tasks like writing paragraphs, reports, and articles easier. Today there are a lot of Websites and Social media platforms that use this concept to make web apps user-friendly.
Autocorrector Feature Using NLP In Python
So, here we are using Machine Learning and NLP to make an autocorrection generator that will suggest to us the correct spellings for the input word. We will be using Python Programming Language for this.
Let’s move ahead with the project.
We will be using NTLK Library for the implementation of NLP-related tasks.
To import NLTK use the below command
import nltk
nltk.download('all')
Then the first task is to import the text file we will be using to create the word list of correct words.
You can download the text file from this link.
Python3
import re
w = []
with open ( 'final.txt' , 'r' , encoding = "utf8" ) as f:
file_name_data = f.read()
file_name_data = file_name_data.lower()
w = re.findall( '\w+' , file_name_data)
main_set = set (w)
|
Now we have to count the words and store their frequency. For that we will use dictionary.
Python3
def counting_words(words):
word_count = {}
for word in words:
if word in word_count:
word_count[word] + = 1
else :
word_count[word] = 1
return word_count
|
Then to calculate the probability of the words prob_cal function is used.
Python3
def prob_cal(word_count_dict):
probs = {}
m = sum (word_count_dict.values())
for key in word_count_dict.keys():
probs[key] = word_count_dict[key] / m
return probs
|
The further code is divided into 5 main parts, that includes the creation of all types of different words that are possible.
To do this, we can use :
- Lemmatization
- Deletion of letter
- Switching Letter
- Replace Letter
- Insert new Letter
Let’s see the code implementation of each point
To do Lemmatization we will be using pattern module. You can install it using the below command
pip install pattern
Then you can the below code
Python3
import pattern
from pattern.en import lemma, lexeme
from nltk.stem import WordNetLemmatizer
def LemmWord(word):
return list (lexeme(wd) for wd in word.split())[ 0 ]
|
DeleteLetter : Function that Removes a letter from a given word.
Python3
def DeleteLetter(word):
delete_list = []
split_list = []
for i in range ( len (word)):
split_list.append((word[ 0 :i], word[i:]))
for a, b in split_list:
delete_list.append(a + b[ 1 :])
return delete_list
|
Switch_ : This function swaps two letters of the word.
Python3
def Switch_(word):
split_list = []
switch_l = []
for i in range ( len (word)):
split_list.append((word[ 0 :i], word[i:]))
switch_l = [a + b[ 1 ] + b[ 0 ] + b[ 2 :] for a, b in split_list if len (b) > = 2 ]
return switch_l
|
Replace_ : It changes one letter to another.
Python3
def Replace_(word):
split_l = []
replace_list = []
for i in range ( len (word)):
split_l.append((word[ 0 :i], word[i:]))
alphs = 'abcdefghijklmnopqrstuvwxyz'
replace_list = [a + l + (b[ 1 :] if len (b) > 1 else '')
for a, b in split_l if b for l in alphs]
return replace_list
|
insert_: It adds additional characters from the bunch of alphabets (one-by-one).
Python3
def insert_(word):
split_l = []
insert_list = []
for i in range ( len (word) + 1 ):
split_l.append((word[ 0 :i], word[i:]))
alphs = 'abcdefghijklmnopqrstuvwxyz'
insert_list = [a + l + b for a, b in split_l for l in alphs]
return insert_list
|
Now, we have implemented all the five steps. It’s time to merge all the words (i.e. all functions) formed by those steps.
To implement that we will be using 2 different functions
Python3
def colab_1(word, allow_switches = True ):
colab_1 = set ()
colab_1.update(DeleteLetter(word))
if allow_switches:
colab_1.update(Switch_(word))
colab_1.update(Replace_(word))
colab_1.update(insert_(word))
return colab_1
def colab_2(word, allow_switches = True ):
colab_2 = set ()
edit_one = colab_1(word, allow_switches = allow_switches)
for w in edit_one:
if w:
edit_two = colab_1(w, allow_switches = allow_switches)
colab_2.update(edit_two)
return colab_2
|
Now, The main task is to extract the correct words among all. To do so we will be using a get_corrections function.
Python3
def get_corrections(word, probs, vocab, n = 2 ):
suggested_word = []
best_suggestion = []
suggested_word = list (
(word in vocab and word) or colab_1(word).intersection(vocab)
or colab_2(word).intersection(
vocab))
best_suggestion = [[s, probs[s]] for s in list ( reversed (suggested_word))]
return best_suggestion
|
Now the code is ready, we can test it for any user input by the below code.
Let’s print top 3 suggestions made by the Autocorrect.
Python3
my_word = input ( "Enter any word:" )
word_count = counting_words(main_set)
probs = probab_cal(word_count)
tmp_corrections = get_corrections(my_word, probs, main_set, 2 )
for i, word_prob in enumerate (tmp_corrections):
if (i < 3 ):
print (word_prob[ 0 ])
else :
break
|
Output :
Enter any word:daedd
dared
daned
died
Conclusion
So, we have implemented the basic auto-corrector using the NLTK Library and Python. For further steps, we can work on the High level auto-corrector system which uses the large amount of dataset and works more efficiently.
To enhance accuracy, we can also use transformers and more NLP related techniques like n-grams, Tf-idf, and so on.
Share your thoughts in the comments
Please Login to comment...