PageRank is an algorithm used by Google to rank the importance of different websites. While there have been changes over the years, the central idea is to assign each site a score based on the importance of other pages that link to that page.
More mathematically, suppose there are N sites, and each site i has a certain count Ci of outgoing links. Then the score for a particular site Sj is defined as:
Here, Sx, Sy, ..., Sz denote the scores of all the other sites that have outgoing links to Sj, and d is a damping factor, usually set to around 0.85, used to model the probability that a user will stop searching.
Given a directed graph of links between various websites, write a program that calculates each site’s page rank.
Input:
N =3Links: [[1],[2],[0]]// Site 0->1, Site 1->2, Site 2->0
d =0.85iterations =100Output: [0.333,0.333,0.333](approximately)Explanation:
In this circular graph, each site links to exactly one other site.After convergence, all sites have equal PageRank since the graph is symmetric.Each site receives equal "link juice" from its predecessor.
Input:
N =4Links: [[1,2],[3],[3],[0]]// Site 0->1,2; Site 1->3; Site 2->3; Site 3->0
d =0.85iterations =100Output: [0.369,0.142,0.142,0.347](approximately)Explanation:
Site 0 and 3 have higher PageRank because:- Site 0 receives links from site 3- Site 3 receives links from both sites 1 and 2Sites 1 and 2 have lower PageRank as they only receive from site 0 but split its influence.
Input:
N =2Links: [[1],[]]// Site 0->1, Site 1 has no outgoing links
d =0.85iterations =100Output: [0.176,0.824](approximately)Explanation:
Site 1 becomes a "sink"with high PageRank because it receives from site 0.Site 0 has lower PageRank since it gives away its influence without receiving any.The dangling node(site 1) accumulates PageRank over iterations.
Input:
N =3Links: [[1,2],[],[0]]// Site 0->1,2; Site 1 no links; Site 2->0
d =0.5iterations =50Output: [0.4,0.267,0.333](approximately)Explanation:
With lower damping factor(0.5), the random teleportation effect is stronger.Site 0 has highest rank due to receiving from site 2.Site 1is a sink but gets less accumulation due to higher teleportation probability.
Input:
N =1Links: [[]]// Single site with no outgoing links
d =0.85iterations =10Output: [1.0]Explanation:
Single site case: the site gets all the PageRank(probability 1).No links to distribute, so the site retains full rank.
PageRank can be computed using the power iteration method on a transition matrix. We construct a matrix where each entry represents the probability of moving from one page to another. The PageRank vector is the stationary distribution of this Markov chain, found by repeatedly multiplying the matrix with an initial rank vector until convergence.
classSolution:
defpage_rank(self, N: int, links: List[List[int]], d: float =0.85, max_iter: int =100, eps: float =1e-6) -> List[float]:
# Build transition matrix M = [[0.0for _ in range(N)] for _ in range(N)]
out_count = [len(links[i]) for i in range(N)]
# Fill transition matrixfor i in range(N):
if out_count[i] ==0:
# Dangling node: distribute equally to all nodesfor j in range(N):
M[i][j] =1.0/ N
else:
# Regular node: distribute among linked nodesfor neighbor in links[i]:
M[i][neighbor] =1.0/ out_count[i]
# Apply damping factorfor i in range(N):
for j in range(N):
M[i][j] = d * M[i][j] + (1.0- d) / N
# Initialize PageRank vector pr = [1.0/ N for _ in range(N)]
# Power iterationfor iter in range(max_iter):
new_pr = [0.0for _ in range(N)]
# Matrix-vector multiplication: new_pr = M^T * prfor j in range(N):
for i in range(N):
new_pr[j] += M[i][j] * pr[i]
# Check convergence diff = sum(abs(new_pr[i] - pr[i]) for i in range(N))
pr = new_pr
if diff < eps:
breakreturn pr
⏰ Time complexity: O(k * N²), where k is the number of iterations (typically converges in 50-100 iterations) and N is the number of nodes. Each iteration requires matrix-vector multiplication
🧺 Space complexity: O(N²), for storing the transition matrix
Instead of building the full transition matrix, we can directly apply the PageRank formula in each iteration. For each node, we calculate its new PageRank by summing contributions from all nodes that link to it, divided by their respective outgoing link counts, plus the random teleportation factor.
classSolution:
defpage_rank_direct(self, N: int, links: List[List[int]], d: float =0.85, max_iter: int =100, eps: float =1e-6) -> List[float]:
# Build incoming links map for efficient lookup incoming = [[] for _ in range(N)]
out_count = [len(links[i]) for i in range(N)]
for i in range(N):
for neighbor in links[i]:
incoming[neighbor].append(i)
# Initialize PageRank pr = [1.0/ N for _ in range(N)]
for iter in range(max_iter):
new_pr = [0.0for _ in range(N)]
# Calculate dangling nodes contribution dangling_sum = sum(pr[i] for i in range(N) if out_count[i] ==0)
# Calculate new PageRank for each nodefor j in range(N):
# Teleportation factor new_pr[j] = (1.0- d) / N
# Add dangling nodes contribution new_pr[j] += d * dangling_sum / N
# Add contributions from incoming linksfor i in incoming[j]:
if out_count[i] >0:
new_pr[j] += d * pr[i] / out_count[i]
# Check convergence diff = sum(abs(new_pr[i] - pr[i]) for i in range(N))
pr = new_pr
if diff < eps:
breakreturn pr
⏰ Time complexity: O(k * (N + E)), where k is the number of iterations, N is the number of nodes, and E is the number of edges. More efficient than matrix approach for sparse graphs
🧺 Space complexity: O(N + E), for storing incoming links and PageRank vectors