Odd One out the problem is one of the most interesting and goto problems when it comes to testing the logical reasoning skills of an individual. It is often used in many competitive exams and placement rounds as it checks the individual’s analytical skills and decision-making ability. In this article, we are going to write a python code that can be used to find the odd words amongst a given set of words.
Suppose, we are given a set of words like Apple, Mango, Orange, Party, Guava, and we have to find the odd word. We as a human can analyze and predict that Party is the odd word as all other words are names of fruit, but for a model to understand this and find this out is very difficult. Here, we will be using Word2Vec model and a pre-trained model named ‘GoogleNews-vectors-negative300.bin‘ which is trained on over 50 Billion words by Google. Each word inside the pre-trained dataset is embedded in a 300-dimensional space and the words which are similar in context/meaning are placed closer to each other in the space and have a high cosine similarity value.
Methodology to find out the odd word:
We will find the average vector of all the given word vectors, and then we compare cosimilarity value of each word vector with the average vector value, the word with the least cosimilarity will be our odd word.
Importing important libraries:
We need to install an additional gensim library, to use word2vec model, to install gensim use the command ‘pip install gensim‘ on your terminal/command prompt.
Loading the word vectors using the pre-trained model:
Defining a function to predict the odd word:
Testing our model:
cosine similarity score between apple and mean_vector is 0.765 cosine similarity score between mango and mean_vector is 0.808 cosine similarity score between juice and mean_vector is 0.688 cosine similarity score between party and mean_vector is 0.289 cosine similarity score between orange and mean_vector is 0.611 cosine similarity score between guava and mean_vector is 0.790 The odd word is: party
Similarly, for another example, let’s say:
cosine similarity score between India and mean_vector is 0.660 cosine similarity score between paris and mean_vector is 0.518 cosine similarity score between Russia and mean_vector is 0.691 cosine similarity score between France and mean_vector is 0.758 cosine similarity score between Germany and mean_vector is 0.763 cosine similarity score between USA and mean_vector is 0.564 The odd word is: paris