Odd One out the problem is one of the most interesting and goto problems when it comes to testing the logical reasoning skills of an individual. It is often used in many competitive exams and placement rounds as it checks the individual’s analytical skills and decision-making ability. In this article, we are going to write a python code that can be used to find the odd words amongst a given set of words.
Suppose, we are given a set of words like Apple, Mango, Orange, Party, Guava, and we have to find the odd word. We as a human can analyze and predict that Party is the odd word as all other words are names of fruit, but for a model to understand this and find this out is very difficult. Here, we will be using Word2Vec model and a pre-trained model named ‘GoogleNews-vectors-negative300.bin‘ which is trained on over 50 Billion words by Google. Each word inside the pre-trained dataset is embedded in a 300-dimensional space and the words which are similar in context/meaning are placed closer to each other in the space and have a high cosine similarity value.
Methodology to find out the odd word:
We will find the average vector of all the given word vectors, and then we compare cosimilarity value of each word vector with the average vector value, the word with the least cosimilarity will be our odd word.
Importing important libraries:
We need to install an additional gensim library, to use word2vec model, to install gensim use the command ‘pip install gensim‘ on your terminal/command prompt.
Python3
import numpy as np import gensim from gensim.models import word2vec,KeyedVectors from sklearn.metrics.pairwise import cosine_similarity |
Loading the word vectors using the pre-trained model:
Python3
vector_word_notations = KeyedVectors.load_word2vec_format( 'GoogleNews-vectors-negative300.bin' ,binary = True ) |
Defining a function to predict the odd word:
Python3
def odd_word_out(input_words): '''The function accepts a list of word and returns the odd word.''' # Generate all word embeddings for the given list of words whole_word_vectors = [vector_word_notations[i] for i in input_words] # average vector for all word vectors mean_vector = np.mean(whole_word_vectors,axis = 0 ) # Iterate over every word and find similarity odd_word = None minimum_similarity = 99999.0 # Can be any very high value for i in input_words: similarity = cosine_similarity([vector_word_notations[i]],[mean_vector]) if similarity < minimum_similarity: minimum_similarity = similarity odd_word = i print ( "cosine similarity score between %s and mean_vector is %.3f" % (i,similarity)) print ( "\nThe odd word is: " + odd_word) |
Testing our model:
Python3
input_1 = [ 'apple' , 'mango' , 'juice' , 'party' , 'orange' , 'guava' ] # party is odd word odd_word_out(input_1) |
Output:
cosine similarity score between apple and mean_vector is 0.765 cosine similarity score between mango and mean_vector is 0.808 cosine similarity score between juice and mean_vector is 0.688 cosine similarity score between party and mean_vector is 0.289 cosine similarity score between orange and mean_vector is 0.611 cosine similarity score between guava and mean_vector is 0.790 The odd word is: party
Similarly, for another example, let’s say:
Python
input_2 = [ 'India' , 'paris' , 'Russia' , 'France' , 'Germany' , 'USA' ] # paris is an odd word since it is a capital and other are countries odd_word_out(input_2) |
Output:
cosine similarity score between India and mean_vector is 0.660 cosine similarity score between paris and mean_vector is 0.518 cosine similarity score between Russia and mean_vector is 0.691 cosine similarity score between France and mean_vector is 0.758 cosine similarity score between Germany and mean_vector is 0.763 cosine similarity score between USA and mean_vector is 0.564 The odd word is: paris