NLP | WuPalmer – WordNet Similarity

How Wu & Palmer Similarity works ?
It calculates relatedness by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of the LCS (Least Common Subsumer).

The score can be 0 < score <= 1. The score can never be zero because the depth of the LCS is never zero (the depth of the root of taxonomy is one).
It calculates the similarity based on how similar the word senses are and where the Synsets occur relative to each other in the hypernym tree.

Code #1 : Introducing Synsets

filter_none

edit
close

play_arrow

link
brightness_4
code

from nltk.corpus import wordnet
  
syn1 = wordnet.synsets('hello')[0]
syn2 = wordnet.synsets('selling')[0]
  
print ("hello name :  ", syn1.name())
print ("selling name :  ", syn2.name())

chevron_right


Output :



hello name :   hello.n.01
selling name :   selling.n.01

 
Code #2 : Wu Similarity

filter_none

edit
close

play_arrow

link
brightness_4
code

syn1.wup_similarity(syn2)

chevron_right


Output :

0.26666666666666666

hello and selling are apparently 27% similar! This is because they share common hypernyms further up the two.
 
Code #3 : Let’s check the hypernyms in between.

filter_none

edit
close

play_arrow

link
brightness_4
code

sorted(syn1.common_hypernyms(syn2))

chevron_right


Output :

[Synset('abstraction.n.06'), Synset('entity.n.01')]

One of the core metrics used to calculate similarity is the shortest path the distance between the two Synsets and their common hypernym.
 
Code #4 : Let’s understand the use of hypernerm.

filter_none

edit
close

play_arrow

link
brightness_4
code

ref = syn1.hypernyms()[0]
print ("Self comprison : "
       syn1.shortest_path_distance(ref))
  
print ("Distance of hello from greeting : "
       syn1.shortest_path_distance(syn2))
  
print ("Distance of greeting from hello : "
       syn2.shortest_path_distance(syn1))

chevron_right


Output :

Self comprison :  1
Distance of hello from greeting :  11
Distance of greeting from hello :  11

Note : The similarity score is very high i.e. they are many steps away from each other becuase they are not so similar. The codes mentioned here uses ‘noun’ but one can use any Part of Speech (POS).



My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.