Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Weighted PageRank Algorithm

  • Last Updated : 13 Oct, 2020

Prerequisite: PageRank Algorithm

The more popular a webpage is, the more are the linkages that other webpages tend to have to them. Weighted PageRank algorithm is an extension of the conventional PageRank algorithm based on the same concept.

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.



Weighted PageRank algorithm assigns higher rank values to more popular (important) pages instead of dividing the rank value of a page evenly among its outlink pages. Each outlink page gets a value proportional to its popularity, i.e. its number of inlinks and outlinks.



To a webpage ‘u’, an inlink is a URL of another webpage which contains a link pointing to ‘u’. Similarly to webpage ‘u’, an outlink is a link appearing in ‘u’ which points to another webpage. The number of inlinks is represented by Win(v,u) and the number of outlinks is represented as Wout(v,u).

Win(v,u) is the weight of link (v, u) calculated based on the number of inlinks of page u and the number of inlinks of all reference pages of page v.

W^{i n}(v, u)=I_{u} /\left(\sum_{p \in R(v)} I_{p}\right)

Here, Ip and Iu represent the number of inlinks of page ‘p’ and ‘u’ respectively. R(v) represents the list of all reference pages of page ‘v’.

Wout(v,u) is the weight of link (v, u) calculated based on the number of outlinks of page u and the number of outlinks of all reference pages of page v.

W^{\text {out }}(v, u)=O_{u} /\left(\Sigma_{p \in R(v)} O_{p}\right)

Here, Op and Ou represent the number of outlinks of page ‘p’ and ‘u’ respectively. R(v) represents the list of all reference pages of page ‘v’.

Based on the importance of all pages as describes by their number of inlinks and outlinks, the Weighted PageRank formula is given as:



P R(u)=(1-d)+d \sum_{v \in B(u)} P R(v) * W_{(v, u)}^{i n} * W^{o u t}(v, u)

Here, PR(x) refers to the Weighted PageRank of page x. 

d refers to the damping factor. The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is the damping factor.

Imagine a scenario where there are 5 webpages A, B, C, D and E.

  • Site A (outlinks to B, C, D)
  • Site B (outlinks to A, C, D)
  • Site C (outlinks to D)
  • Site D (outlinks to C, E)
  • Site E (outlinks to B, C, D)

The below code demonstrates how the Weighted PageRank for each webpage in the above scenario can be calculated. The input is taken in the form of an outlink matrix and is run for a total of 5 iterations.

Code:




def win(matrix, m, o):
    k = 0
    for i in range(0, n):
        if(int(matrix[i][m]) == 1):
            k = k+1
    l = 0
    for i in range(0, n):
        if(int(matrix[o][i] == 1)):
            for j in range(0, n):
                if(matrix[j][i] == 1):
                    l = l+1
    return float(k/l)
  
  
def wout(matrix, m, o):
    k = 0
    for i in range(0, n):
        if(int(matrix[0][i]) == 1):
            k = k+1
    l = 0
    for i in range(0, n):
        if(int(matrix[o][i] == 1)):
            for j in range(0, n):
                if(matrix[i][j] == 1):
                    l = l+1
    return float(k/l)
  
  
def pagerank(matrix, o, n, p):
    a = 0
    for i in range(0, n):
        if(int(matrix[i][o]) == 1):
            k = 0
            for s in range(0, n):
                if(matrix[i][s] == 1):
                    k = k+1
            a = a+float((p[i]/k)*win(matrix, i, o)*wout(matrix, i, o))
    return a
  
  
n = 5
matrix = [[0, 1, 1, 1, 0], [1, 0, 1, 1, 0], [
    0, 0, 0, 1, 0], [0, 0, 1, 0, 1], [0, 1, 1, 1, 0]]
d = 0.25  # damping factor
  
o = 5
print("Number of iterations is:", o)
  
sum = 0
p = []
  
for i in range(0, n):
    p.append(1)
for k in range(0, o):
    for u in range(0, n):
        g = pagerank(matrix, u, n, p)
        p[u] = (1-d)+d*g
for i in range(0, n):
    sum += p[i]
    print("Page rank of node ", i+1, "is : ", p[i])
print("Sum of all page ranks: ", sum)

Output: 

Number of iterations is: 5
Page rank of node  1 is :  0.7563090216610497
Page rank of node  2 is :  0.7570825988098917
Page rank of node  3 is :  1.021617613959075
Page rank of node  4 is :  0.9412927238508162
Page rank of node  5 is :  0.7735323180962704
Sum of all page ranks:  4.249834276377103



My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!