Top K Nearest Words using Edit Distances in NLP

Last Updated : 08 Jun, 2023

We can find the Top K nearest matching words to the given query input word by using the concept of Edit/Levenstein distance. If any word is the same word as the query input(word) then their Edit distance would be zero and is a perfect match and so on. So we can find the Edit distances between the query word and the words present in our vocabulary pairwise and then sort them according to their edit distances. This will help us arrange the words in the vocabulary in the order of their nearest match to the query. This is very useful to solve Information Retrieval tasks.

Code implementations

If we have a vocabulary of words, then we can find the K nearest matching words to the query with the help of Edit distance. Here Vocabulary can be a List of Strings.

Steps:

Create a function to find the edit distance between two words.
Create a second function to find the k nearest word. It uses the edit distance function to measure the distance between two words.
For the vocabulary, we are using some text and uses text.split() command to store as the list of words.
In the k nearest word-defined function input the query or word, vocabulary, and the number of the nearest word we want to find.

Python3

# code for Edit Distance 
def edit_Distance(s1, s2): 
    r, c = len(s1), len(s2) 
    dp = [[-1 for j in range(0, c+1)]for i in range(0, r+1)] 
    for i in range(0, r+1): 
        dp[i][0] = i 
    for j in range(0, c+1): 
        dp[0][j] = j 
    for i in range(1, r+1): 
        for j in range(1, c+1): 
            if(s1[i-1] == s2[j-1]): 
                dp[i][j] = dp[i-1][j-1] 
            else: 
                dp[i][j] = 1+min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) 
    return dp[r] 
  
  
# code for finding the Top k matching words 
# here s1 is the query input 
def k_nearest_word(s1, vocabulary, k): 
    k_nearest_word = [] 
    word_dist = {} 
    for word in vocabulary: 
        word_dist[word] = edit_Distance(s1, word) 
    word_dist = dict(sorted(word_dist.items(), key=lambda item: item[1])) 
    ans = [] 
    li = list(word_dist.keys()) 
    for i in range(0, k): 
        ans.append(li[i]) 
    return ans 
  
  
# for Testing purposes 
text = ''' 
Pytorch is an open-source deep learning framework available with a Python and C++ interface.  
it resides inside the torch module.  
the data that has to be processed is input in the form of a tensor. 
that allows you to develop and train deep learning models.  
However, as the size and complexity of your models grow,  
the time it takes to train them can become prohibitive. 
  
'''
# vocabulary=['This','is','salt','saw','Bat','salty','salts'] 
vocabulary = text.split() 
query = 'Pytorch'
k = 3  # top k nearest matching 
li = k_nearest_word(query, vocabulary, k) 
print(li) 

Output:

['Pytorch', 'torch', 'Python']

Using Levenshtein library

Steps:

Import the Levenshtein library. We will use it to find the distance between the two strings.
Create a second function to find the k nearest word. It uses the Levenshtein.distance function to measure the distance between two words.
For the vocabulary, we are using some text and uses text.split() command to store as the list of words.
In the k nearest word-defined function input the query or word, vocabulary, and the number of the nearest word we want to find.

Python3

import Levenshtein 
  
def find_top_k_nearest_words(query_input, words_list, k): 
    distances = [] 
    for word in words_list: 
        distance = Levenshtein.distance(query_input, word) 
        distances.append((word, distance)) 
    sorted_distances = sorted(distances, key=lambda x: x[1]) 
    return [d[0] for d in sorted_distances[:k]] 
  
nearest_words = find_top_k_nearest_words(query, vocabulary, k) 
print(nearest_words)

Output:

['Pytorch', 'torch', 'Python']

Suggest improvement

Word Movers Distance (WMD) in NLP

Share your thoughts in the comments

Top K Nearest Words using Edit Distances in NLP

Code implementations

Steps:

Python3

Using Levenshtein library

Steps:

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?