Skip to content
Related Articles
Finding the Odd Word amongst given words using Word2Vec embeddings
• Last Updated : 27 Jan, 2021

Odd One out the problem is one of the most interesting and goto problems when it comes to testing the logical reasoning skills of an individual. It is often used in many competitive exams and placement rounds as it checks the individual’s analytical skills and decision-making ability. In this article, we are going to write a python code that can be used to find the odd words amongst a given set of words.

Suppose, we are given a set of words like Apple, Mango, Orange, Party, Guava, and we have to find the odd word. We as a human can analyze and predict that Party is the odd word as all other words are names of fruit, but for a model to understand this and find this out is very difficult. Here, we will be using Word2Vec model and a pre-trained model named ‘GoogleNews-vectors-negative300.bin‘ which is trained on over 50 Billion words by Google. Each word inside the pre-trained dataset is embedded in a 300-dimensional space and the words which are similar in context/meaning are placed closer to each other in the space and have a high cosine similarity value.

Methodology to find out the odd word:

We will find the average vector of all the given word vectors, and then we compare cosimilarity value of each word vector with the average vector value, the word with the least cosimilarity will be our odd word.

Importing important libraries:

We need to install an additional gensim library, to use word2vec model, to install gensim use the command ‘pip install gensim‘ on your terminal/command prompt.

## Python3

 `import` `numpy as np``import` `gensim``from` `gensim.models ``import` `word2vec,KeyedVectors``from` `sklearn.metrics.pairwise ``import` `cosine_similarity`

Loading the word vectors using the pre-trained model:

## Python3

 `vector_word_notations ``=` `KeyedVectors.load_word2vec_format(``'GoogleNews-vectors-negative300.bin'``,binary``=``True``)`

Defining a function to predict the odd word:

## Python3

 `def` `odd_word_out(input_words):``    ``'''The function accepts a list of word and returns the odd word.'''``    ` `    ``# Generate all word embeddings for the given list of words``    ` `    ``whole_word_vectors ``=` `[vector_word_notations[i] ``for` `i ``in` `input_words]``    ` `    ``# average vector for all word vectors``    ``mean_vector ``=` `np.mean(whole_word_vectors,axis``=``0``)``    ` `    ``# Iterate over every word and find similarity``    ``odd_word ``=` `None``    ``minimum_similarity ``=` `99999.0` `# Can be any very high value``    ` `    ``for` `i ``in` `input_words:``        ``similarity ``=` `cosine_similarity([vector_word_notations[i]],[mean_vector])``        ``if` `similarity < minimum_similarity:``            ``minimum_similarity ``=` `similarity``            ``odd_word ``=` `i``    ` `        ``print``(``"cosine similarity score between %s and mean_vector is %.3f"``%``(i,similarity))``    ` `    ``print``(``"\nThe odd word is: "``+``odd_word)`

Testing our model:

## Python3

 `input_1 ``=` `[``'apple'``,``'mango'``,``'juice'``,``'party'``,``'orange'``,``'guava'``] ``# party is odd word``odd_word_out(input_1)`

Output:

```cosine similarity score between apple and mean_vector is 0.765
cosine similarity score between  mango and mean_vector is 0.808
cosine similarity score between juice and mean_vector is 0.688
cosine similarity score between party and mean_vector is 0.289
cosine similarity score between orange and mean_vector is 0.611
cosine similarity score between guava and mean_vector is 0.790

The odd word is: party```

Similarly, for another example, let’s say:

## Python

 `input_2 ``=` `[``'India'``,``'paris'``,``'Russia'``,``'France'``,``'Germany'``,``'USA'``]``# paris is an odd word since it is a capital and other are countries``odd_word_out(input_2)`

Output:

```cosine similarity score between India and mean_vector is 0.660
cosine similarity score between paris and mean_vector is 0.518
cosine similarity score between Russia and mean_vector is 0.691
cosine similarity score between France and mean_vector is 0.758
cosine similarity score between Germany and mean_vector is 0.763
cosine similarity score between USA and mean_vector is 0.564

The odd word is: paris``` My Personal Notes arrow_drop_up