Open In App
Related Articles

Finding the Word Analogy from given words using Word2Vec embeddings

Improve Article
Improve
Save Article
Save
Like Article
Like

In many placement exam rounds, we often encounter a basic question to find word analogies. In the word analogy task, we complete the sentence “a is to b as c is to ___ “, which is often represented as a : b :: c : d and we have to find the word ‘d’. A  sample question can be like: ‘man is to woman as king is to ___‘. 

The human brain can recognize that the blank must be filled with the word ‘queen‘. But for a machine to understand this pattern and fill the blank with the most appropriate word requires a lot of training to be done. What if we can use a Machine Learning algorithm to automate this task of finding the word analogy. In this tutorial, we will be using Word2Vec model and a pre-trained model named ‘GoogleNews-vectors-negative300.bin‘ which is trained on over 50 Billion words by Google. Each word inside the pre-trained dataset is embedded in a 300-dimensional space and the words which are similar in context/meaning are placed closer to each other in the space.

Methodology to find out the analogous word:

In this problem, our goal is to find a word d, such that the associated word vectors va, vb, vc, vd are related to each other in the following relationship: ‘vb – va = vd – vc‘. We will measure the similarity between vb-va and vd-vc using cosine similarity.

Importing important libraries:

We need to install an additional gensim library, to use word2vec model, to install gensim use the command pip install gensim on your terminal/command prompt.

Python3




import numpy as np
import gensim
from gensim.models import word2vec,KeyedVectors
from sklearn.metrics.pairwise import cosine_similarity


Loading the word vectors using the pre-trained model:

Python3




vector_word_notations = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin',binary=True)


Defining a function to predict analogous word:

Python3




def analogous_word(word_1,word_2,word_3,vector_word_notations):
    ''' The function accepts a triad of words, word_1, word_2, word_3 and returns word_4 such that word_1:word_2::word_3:word_4 '''
      
    # converting each word to its lowercase
    word_1,word_2,word_3 = word_1.lower(),word_2.lower(),word_3.lower()
      
    # Similarity between |word_2-word_1| = |word_4-word_3| should be maximum
    maximum_similarity = -99999
      
    word_4 = None
      
    words = vector_word_notations.vocab.keys()
      
    va,vb,vc = vector_word_notations[word_1],\
    vector_word_notations[word_2],vector_word_notations[word_3]
      
    # to find word_4 such that similarity
    # (|word_2 - word_1|, |word_4 - word_3|) should be maximum
      
    for i in words:
        if i in [word_1,word_2,word_3]:
            continue
          
        wvec = vector_word_notations[i]
        similarity = cosine_similarity(,[wvec-vc])
          
        if similarity > maximum_similarity:
            maximum_similarity = similarity
            word_4 = i     
  
    return word_4


Testing our model:

Python3




triad_1 = ("Man","Woman","King")
# *triad_1 is written to unpack the elements in the tuple
output = analogous_word(*triad_1,word_vectors) 
print(output)
  
# The output will be shown as queen



Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Last Updated : 22 Jan, 2021
Like Article
Save Article
Previous
Next
Similar Reads
Complete Tutorials