In many placement exam rounds, we often encounter a basic question to find word analogies. In the word analogy task, we complete the sentence “a is to b as c is to ___ “, which is often represented as a : b :: c : d and we have to find the word ‘d’. A sample question can be like: ‘man is to woman as king is to ___‘.
The human brain can recognize that the blank must be filled with the word ‘queen‘. But for a machine to understand this pattern and fill the blank with the most appropriate word requires a lot of training to be done. What if we can use a Machine Learning algorithm to automate this task of finding the word analogy. In this tutorial, we will be using Word2Vec model and a pre-trained model named ‘GoogleNews-vectors-negative300.bin‘ which is trained on over 50 Billion words by Google. Each word inside the pre-trained dataset is embedded in a 300-dimensional space and the words which are similar in context/meaning are placed closer to each other in the space.
Methodology to find out the analogous word:
In this problem, our goal is to find a word d, such that the associated word vectors va, vb, vc, vd are related to each other in the following relationship: ‘vb – va = vd – vc‘. We will measure the similarity between vb-va and vd-vc using cosine similarity.
Importing important libraries:
We need to install an additional gensim library, to use word2vec model, to install gensim use the command ‘pip install gensim‘ on your terminal/command prompt.
Python3
import numpy as np
import gensim
from gensim.models import word2vec,KeyedVectors
from sklearn.metrics.pairwise import cosine_similarity
|
Loading the word vectors using the pre-trained model:
Python3
vector_word_notations = KeyedVectors.load_word2vec_format( 'GoogleNews-vectors-negative300.bin' ,binary = True )
|
Defining a function to predict analogous word:
Python3
def analogous_word(word_1,word_2,word_3,vector_word_notations):
word_1,word_2,word_3 = word_1.lower(),word_2.lower(),word_3.lower()
maximum_similarity = - 99999
word_4 = None
words = vector_word_notations.vocab.keys()
va,vb,vc = vector_word_notations[word_1],\
vector_word_notations[word_2],vector_word_notations[word_3]
for i in words:
if i in [word_1,word_2,word_3]:
continue
wvec = vector_word_notations[i]
similarity = cosine_similarity(,[wvec - vc])
if similarity > maximum_similarity:
maximum_similarity = similarity
word_4 = i
return word_4
|
Testing our model:
Python3
triad_1 = ( "Man" , "Woman" , "King" )
output = analogous_word( * triad_1,word_vectors)
print (output)
|
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
22 Jan, 2021
Like Article
Save Article