Finding the Word Analogy from given words using Word2Vec embeddings
• Last Updated : 22 Jan, 2021

In many placement exam rounds, we often encounter a basic question to find word analogies. In the word analogy task, we complete the sentence “a is to b as c is to ___ “, which is often represented as a : b :: c : d and we have to find the word ‘d’. A  sample question can be like: ‘man is to woman as king is to ___‘.

The human brain can recognize that the blank must be filled with the word ‘queen‘. But for a machine to understand this pattern and fill the blank with the most appropriate word requires a lot of training to be done. What if we can use a Machine Learning algorithm to automate this task of finding the word analogy. In this tutorial, we will be using Word2Vec model and a pre-trained model named ‘GoogleNews-vectors-negative300.bin‘ which is trained on over 50 Billion words by Google. Each word inside the pre-trained dataset is embedded in a 300-dimensional space and the words which are similar in context/meaning are placed closer to each other in the space.

Methodology to find out the analogous word:

In this problem, our goal is to find a word d, such that the associated word vectors va, vb, vc, vd are related to each other in the following relationship: ‘vb – va = vd – vc‘. We will measure the similarity between vb-va and vd-vc using cosine similarity.

Importing important libraries:

We need to install an additional gensim library, to use word2vec model, to install gensim use the command pip install gensim on your terminal/command prompt.

## Python3

 `import` `numpy as np``import` `gensim``from` `gensim.models ``import` `word2vec,KeyedVectors``from` `sklearn.metrics.pairwise ``import` `cosine_similarity`

Loading the word vectors using the pre-trained model:

## Python3

 `vector_word_notations ``=` `KeyedVectors.load_word2vec_format(``'GoogleNews-vectors-negative300.bin'``,binary``=``True``)`

Defining a function to predict analogous word:

## Python3

 `def` `analogous_word(word_1,word_2,word_3,vector_word_notations):``    ``''' The function accepts a triad of words, word_1, word_2, word_3 and returns word_4 such that word_1:word_2::word_3:word_4 '''``     ` `    ``# converting each word to its lowercase``    ``word_1,word_2,word_3 ``=` `word_1.lower(),word_2.lower(),word_3.lower()``     ` `    ``# Similarity between |word_2-word_1| = |word_4-word_3| should be maximum``    ``maximum_similarity ``=` `-``99999``     ` `    ``word_4 ``=` `None``     ` `    ``words ``=` `vector_word_notations.vocab.keys()``     ` `    ``va,vb,vc ``=` `vector_word_notations[word_1],\``    ``vector_word_notations[word_2],vector_word_notations[word_3]``     ` `    ``# to find word_4 such that similarity``    ``# (|word_2 - word_1|, |word_4 - word_3|) should be maximum``     ` `    ``for` `i ``in` `words:``        ``if` `i ``in` `[word_1,word_2,word_3]:``            ``continue``         ` `        ``wvec ``=` `vector_word_notations[i]``        ``similarity ``=` `cosine_similarity(,[wvec``-``vc])``         ` `        ``if` `similarity > maximum_similarity:``            ``maximum_similarity ``=` `similarity``            ``word_4 ``=` `i     `` ` `    ``return` `word_4`

Testing our model:

## Python3

 triad_1 = ("Man","Woman","King")
# *triad_1 is written to unpack the elements in the tuple
output = analogous_word(*triad_1,word_vectors) 
print(output)
 
# The output will be shown as queen