Finding the Odd Word amongst given words using Word2Vec embeddings
Odd One out the problem is one of the most interesting and goto problems when it comes to testing the logical reasoning skills of an individual. It is often used in many competitive exams and placement rounds as it checks the individual’s analytical skills and decision-making ability. In this article, we are going to write a python code that can be used to find the odd words amongst a given set of words.
Suppose, we are given a set of words like Apple, Mango, Orange, Party, Guava, and we have to find the odd word. We as a human can analyze and predict that Party is the odd word as all other words are names of fruit, but for a model to understand this and find this out is very difficult. Here, we will be using Word2Vec model and a pre-trained model named ‘GoogleNews-vectors-negative300.bin‘ which is trained on over 50 Billion words by Google. Each word inside the pre-trained dataset is embedded in a 300-dimensional space and the words which are similar in context/meaning are placed closer to each other in the space and have a high cosine similarity value.
Methodology to find out the odd word:
We will find the average vector of all the given word vectors, and then we compare cosimilarity value of each word vector with the average vector value, the word with the least cosimilarity will be our odd word.
Importing important libraries:
We need to install an additional gensim library, to use word2vec model, to install gensim use the command ‘pip install gensim‘ on your terminal/command prompt.
Python3
import numpy as np
import gensim
from gensim.models import word2vec,KeyedVectors
from sklearn.metrics.pairwise import cosine_similarity
|
Loading the word vectors using the pre-trained model:
Python3
vector_word_notations = KeyedVectors.load_word2vec_format( 'GoogleNews-vectors-negative300.bin' ,binary = True )
|
Defining a function to predict the odd word:
Python3
def odd_word_out(input_words):
whole_word_vectors = [vector_word_notations[i] for i in input_words]
mean_vector = np.mean(whole_word_vectors,axis = 0 )
odd_word = None
minimum_similarity = 99999.0
for i in input_words:
similarity = cosine_similarity([vector_word_notations[i]],[mean_vector])
if similarity < minimum_similarity:
minimum_similarity = similarity
odd_word = i
print ( "cosine similarity score between %s and mean_vector is %.3f" % (i,similarity))
print ( "\nThe odd word is: " + odd_word)
|
Testing our model:
Python3
input_1 = [ 'apple' , 'mango' , 'juice' , 'party' , 'orange' , 'guava' ]
odd_word_out(input_1)
|
Output:
cosine similarity score between apple and mean_vector is 0.765
cosine similarity score between mango and mean_vector is 0.808
cosine similarity score between juice and mean_vector is 0.688
cosine similarity score between party and mean_vector is 0.289
cosine similarity score between orange and mean_vector is 0.611
cosine similarity score between guava and mean_vector is 0.790
The odd word is: party
Similarly, for another example, let’s say:
Python
input_2 = [ 'India' , 'paris' , 'Russia' , 'France' , 'Germany' , 'USA' ]
odd_word_out(input_2)
|
Output:
cosine similarity score between India and mean_vector is 0.660
cosine similarity score between paris and mean_vector is 0.518
cosine similarity score between Russia and mean_vector is 0.691
cosine similarity score between France and mean_vector is 0.758
cosine similarity score between Germany and mean_vector is 0.763
cosine similarity score between USA and mean_vector is 0.564
The odd word is: paris
Last Updated :
27 Jan, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...