PageRank Algorithm Implementation
Problem
PageRank is an algorithm used by Google to rank the importance of different websites. While there have been changes over the years, the central idea is to assign each site a score based on the importance of other pages that link to that page.
More mathematically, suppose there are N sites, and each site i has a certain count Ci of outgoing links. Then the score for a particular site Sj is defined as:
score(Sj) = (1 - d) / N + d * (score(Sx) / Cx + score(Sy) / Cy + ... + score(Sz) / Cz)
Here, Sx, Sy, ..., Sz denote the scores of all the other sites that have outgoing links to Sj, and d is a damping factor, usually set to around 0.85, used to model the probability that a user will stop searching.
Given a directed graph of links between various websites, write a program that calculates each site's page rank.
Examples
Example 1
Input:
N = 3
Links: [[1], [2], [0]] // Site 0->1, Site 1->2, Site 2->0
d = 0.85
iterations = 100
Output: [0.333, 0.333, 0.333] (approximately)
Explanation:
In this circular graph, each site links to exactly one other site.
After convergence, all sites have equal PageRank since the graph is symmetric.
Each site receives equal "link juice" from its predecessor.
Example 2
Input:
N = 4
Links: [[1, 2], [3], [3], [0]] // Site 0->1,2; Site 1->3; Site 2->3; Site 3->0
d = 0.85
iterations = 100
Output: [0.369, 0.142, 0.142, 0.347] (approximately)
Explanation:
Site 0 and 3 have higher PageRank because:
- Site 0 receives links from site 3
- Site 3 receives links from both sites 1 and 2
Sites 1 and 2 have lower PageRank as they only receive from site 0 but split its influence.
Example 3
Input:
N = 2
Links: [[1], []] // Site 0->1, Site 1 has no outgoing links
d = 0.85
iterations = 100
Output: [0.176, 0.824] (approximately)
Explanation:
Site 1 becomes a "sink" with high PageRank because it receives from site 0.
Site 0 has lower PageRank since it gives away its influence without receiving any.
The dangling node (site 1) accumulates PageRank over iterations.
Example 4
Input:
N = 3
Links: [[1, 2], [], [0]] // Site 0->1,2; Site 1 no links; Site 2->0
d = 0.5
iterations = 50
Output: [0.4, 0.267, 0.333] (approximately)
Explanation:
With lower damping factor (0.5), the random teleportation effect is stronger.
Site 0 has highest rank due to receiving from site 2.
Site 1 is a sink but gets less accumulation due to higher teleportation probability.
Example 5
Input:
N = 1
Links: [[]] // Single site with no outgoing links
d = 0.85
iterations = 10
Output: [1.0]
Explanation:
Single site case: the site gets all the PageRank (probability 1).
No links to distribute, so the site retains full rank.
Solution
Method 1 - Power Iteration with Matrix Approach
Intuition
PageRank can be computed using the power iteration method on a transition matrix. We construct a matrix where each entry represents the probability of moving from one page to another. The PageRank vector is the stationary distribution of this Markov chain, found by repeatedly multiplying the matrix with an initial rank vector until convergence.
Approach
- Build the transition matrix M where M[i][j] = probability of going from page i to page j
- For each page i with outgoing links, distribute its rank equally among its links: M[i][j] = 1/Ci if i links to j
- Handle dangling nodes (pages with no outgoing links) by distributing their rank equally to all pages
- Apply damping factor: M' = d * M + (1-d)/N * J, where J is matrix of all 1/N
- Initialize PageRank vector with equal probabilities: PR = [1/N, 1/N, ..., 1/N]
- Iterate: PR_new = M' * PR until convergence or max iterations reached
- Return final PageRank vector
Code
C++
class Solution {
public:
vector<double> pageRank(int N, vector<vector<int>>& links, double d = 0.85, int maxIter = 100, double eps = 1e-6) {
// Build transition matrix
vector<vector<double>> M(N, vector<double>(N, 0.0));
vector<int> outCount(N, 0);
// Count outgoing links
for (int i = 0; i < N; i++) {
outCount[i] = links[i].size();
}
// Fill transition matrix
for (int i = 0; i < N; i++) {
if (outCount[i] == 0) {
// Dangling node: distribute equally to all nodes
for (int j = 0; j < N; j++) {
M[i][j] = 1.0 / N;
}
} else {
// Regular node: distribute among linked nodes
for (int neighbor : links[i]) {
M[i][neighbor] = 1.0 / outCount[i];
}
}
}
// Apply damping factor
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
M[i][j] = d * M[i][j] + (1.0 - d) / N;
}
}
// Initialize PageRank vector
vector<double> pr(N, 1.0 / N);
// Power iteration
for (int iter = 0; iter < maxIter; iter++) {
vector<double> newPr(N, 0.0);
// Matrix-vector multiplication: newPr = M^T * pr
for (int j = 0; j < N; j++) {
for (int i = 0; i < N; i++) {
newPr[j] += M[i][j] * pr[i];
}
}
// Check convergence
double diff = 0.0;
for (int i = 0; i < N; i++) {
diff += abs(newPr[i] - pr[i]);
}
pr = newPr;
if (diff < eps) break;
}
return pr;
}
};
Go
func pageRank(N int, links [][]int, d float64, maxIter int, eps float64) []float64 {
if d == 0 {
d = 0.85
}
if maxIter == 0 {
maxIter = 100
}
if eps == 0 {
eps = 1e-6
}
// Build transition matrix
M := make([][]float64, N)
for i := range M {
M[i] = make([]float64, N)
}
outCount := make([]int, N)
// Count outgoing links
for i := 0; i < N; i++ {
outCount[i] = len(links[i])
}
// Fill transition matrix
for i := 0; i < N; i++ {
if outCount[i] == 0 {
// Dangling node: distribute equally to all nodes
for j := 0; j < N; j++ {
M[i][j] = 1.0 / float64(N)
}
} else {
// Regular node: distribute among linked nodes
for _, neighbor := range links[i] {
M[i][neighbor] = 1.0 / float64(outCount[i])
}
}
}
// Apply damping factor
for i := 0; i < N; i++ {
for j := 0; j < N; j++ {
M[i][j] = d*M[i][j] + (1.0-d)/float64(N)
}
}
// Initialize PageRank vector
pr := make([]float64, N)
for i := range pr {
pr[i] = 1.0 / float64(N)
}
// Power iteration
for iter := 0; iter < maxIter; iter++ {
newPr := make([]float64, N)
// Matrix-vector multiplication: newPr = M^T * pr
for j := 0; j < N; j++ {
for i := 0; i < N; i++ {
newPr[j] += M[i][j] * pr[i]
}
}
// Check convergence
diff := 0.0
for i := 0; i < N; i++ {
diff += math.Abs(newPr[i] - pr[i])
}
pr = newPr
if diff < eps {
break
}
}
return pr
}
Java
class Solution {
public double[] pageRank(int N, int[][] links, double d, int maxIter, double eps) {
// Build transition matrix
double[][] M = new double[N][N];
int[] outCount = new int[N];
// Count outgoing links
for (int i = 0; i < N; i++) {
outCount[i] = links[i].length;
}
// Fill transition matrix
for (int i = 0; i < N; i++) {
if (outCount[i] == 0) {
// Dangling node: distribute equally to all nodes
for (int j = 0; j < N; j++) {
M[i][j] = 1.0 / N;
}
} else {
// Regular node: distribute among linked nodes
for (int neighbor : links[i]) {
M[i][neighbor] = 1.0 / outCount[i];
}
}
}
// Apply damping factor
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
M[i][j] = d * M[i][j] + (1.0 - d) / N;
}
}
// Initialize PageRank vector
double[] pr = new double[N];
Arrays.fill(pr, 1.0 / N);
// Power iteration
for (int iter = 0; iter < maxIter; iter++) {
double[] newPr = new double[N];
// Matrix-vector multiplication: newPr = M^T * pr
for (int j = 0; j < N; j++) {
for (int i = 0; i < N; i++) {
newPr[j] += M[i][j] * pr[i];
}
}
// Check convergence
double diff = 0.0;
for (int i = 0; i < N; i++) {
diff += Math.abs(newPr[i] - pr[i]);
}
pr = newPr;
if (diff < eps) break;
}
return pr;
}
}
Python
class Solution:
def page_rank(self, N: int, links: List[List[int]], d: float = 0.85, max_iter: int = 100, eps: float = 1e-6) -> List[float]:
# Build transition matrix
M = [[0.0 for _ in range(N)] for _ in range(N)]
out_count = [len(links[i]) for i in range(N)]
# Fill transition matrix
for i in range(N):
if out_count[i] == 0:
# Dangling node: distribute equally to all nodes
for j in range(N):
M[i][j] = 1.0 / N
else:
# Regular node: distribute among linked nodes
for neighbor in links[i]:
M[i][neighbor] = 1.0 / out_count[i]
# Apply damping factor
for i in range(N):
for j in range(N):
M[i][j] = d * M[i][j] + (1.0 - d) / N
# Initialize PageRank vector
pr = [1.0 / N for _ in range(N)]
# Power iteration
for iter in range(max_iter):
new_pr = [0.0 for _ in range(N)]
# Matrix-vector multiplication: new_pr = M^T * pr
for j in range(N):
for i in range(N):
new_pr[j] += M[i][j] * pr[i]
# Check convergence
diff = sum(abs(new_pr[i] - pr[i]) for i in range(N))
pr = new_pr
if diff < eps:
break
return pr
Complexity
- ⏰ Time complexity:
O(k * N²), where k is the number of iterations (typically converges in 50-100 iterations) and N is the number of nodes. Each iteration requires matrix-vector multiplication - 🧺 Space complexity:
O(N²), for storing the transition matrix
Method 2 - Direct Iterative Update
Intuition
Instead of building the full transition matrix, we can directly apply the PageRank formula in each iteration. For each node, we calculate its new PageRank by summing contributions from all nodes that link to it, divided by their respective outgoing link counts, plus the random teleportation factor.
Approach
- Initialize all PageRank values to 1/N
- For each iteration:
- For each node j, calculate new PageRank using the formula
- Sum contributions from all nodes i that link to j: score(Si)/Ci
- Add teleportation factor: (1-d)/N
- Apply damping: d * (sum of contributions) + teleportation
- Handle dangling nodes by distributing their PageRank equally
- Continue until convergence or maximum iterations
- Normalize if needed to ensure sum equals 1
Code
C++
class Solution {
public:
vector<double> pageRankDirect(int N, vector<vector<int>>& links, double d = 0.85, int maxIter = 100, double eps = 1e-6) {
// Build incoming links map for efficient lookup
vector<vector<int>> incoming(N);
vector<int> outCount(N, 0);
for (int i = 0; i < N; i++) {
outCount[i] = links[i].size();
for (int neighbor : links[i]) {
incoming[neighbor].push_back(i);
}
}
// Initialize PageRank
vector<double> pr(N, 1.0 / N);
for (int iter = 0; iter < maxIter; iter++) {
vector<double> newPr(N, 0.0);
double danglingSum = 0.0;
// Calculate dangling nodes contribution
for (int i = 0; i < N; i++) {
if (outCount[i] == 0) {
danglingSum += pr[i];
}
}
// Calculate new PageRank for each node
for (int j = 0; j < N; j++) {
// Teleportation factor
newPr[j] = (1.0 - d) / N;
// Add dangling nodes contribution
newPr[j] += d * danglingSum / N;
// Add contributions from incoming links
for (int i : incoming[j]) {
if (outCount[i] > 0) {
newPr[j] += d * pr[i] / outCount[i];
}
}
}
// Check convergence
double diff = 0.0;
for (int i = 0; i < N; i++) {
diff += abs(newPr[i] - pr[i]);
}
pr = newPr;
if (diff < eps) break;
}
return pr;
}
};
Go
func pageRankDirect(N int, links [][]int, d float64, maxIter int, eps float64) []float64 {
if d == 0 {
d = 0.85
}
if maxIter == 0 {
maxIter = 100
}
if eps == 0 {
eps = 1e-6
}
// Build incoming links map for efficient lookup
incoming := make([][]int, N)
outCount := make([]int, N)
for i := 0; i < N; i++ {
outCount[i] = len(links[i])
for _, neighbor := range links[i] {
incoming[neighbor] = append(incoming[neighbor], i)
}
}
// Initialize PageRank
pr := make([]float64, N)
for i := range pr {
pr[i] = 1.0 / float64(N)
}
for iter := 0; iter < maxIter; iter++ {
newPr := make([]float64, N)
danglingSum := 0.0
// Calculate dangling nodes contribution
for i := 0; i < N; i++ {
if outCount[i] == 0 {
danglingSum += pr[i]
}
}
// Calculate new PageRank for each node
for j := 0; j < N; j++ {
// Teleportation factor
newPr[j] = (1.0 - d) / float64(N)
// Add dangling nodes contribution
newPr[j] += d * danglingSum / float64(N)
// Add contributions from incoming links
for _, i := range incoming[j] {
if outCount[i] > 0 {
newPr[j] += d * pr[i] / float64(outCount[i])
}
}
}
// Check convergence
diff := 0.0
for i := 0; i < N; i++ {
diff += math.Abs(newPr[i] - pr[i])
}
pr = newPr
if diff < eps {
break
}
}
return pr
}
Java
class Solution {
public double[] pageRankDirect(int N, int[][] links, double d, int maxIter, double eps) {
// Build incoming links map for efficient lookup
List<List<Integer>> incoming = new ArrayList<>();
for (int i = 0; i < N; i++) {
incoming.add(new ArrayList<>());
}
int[] outCount = new int[N];
for (int i = 0; i < N; i++) {
outCount[i] = links[i].length;
for (int neighbor : links[i]) {
incoming.get(neighbor).add(i);
}
}
// Initialize PageRank
double[] pr = new double[N];
Arrays.fill(pr, 1.0 / N);
for (int iter = 0; iter < maxIter; iter++) {
double[] newPr = new double[N];
double danglingSum = 0.0;
// Calculate dangling nodes contribution
for (int i = 0; i < N; i++) {
if (outCount[i] == 0) {
danglingSum += pr[i];
}
}
// Calculate new PageRank for each node
for (int j = 0; j < N; j++) {
// Teleportation factor
newPr[j] = (1.0 - d) / N;
// Add dangling nodes contribution
newPr[j] += d * danglingSum / N;
// Add contributions from incoming links
for (int i : incoming.get(j)) {
if (outCount[i] > 0) {
newPr[j] += d * pr[i] / outCount[i];
}
}
}
// Check convergence
double diff = 0.0;
for (int i = 0; i < N; i++) {
diff += Math.abs(newPr[i] - pr[i]);
}
pr = newPr;
if (diff < eps) break;
}
return pr;
}
}
Python
class Solution:
def page_rank_direct(self, N: int, links: List[List[int]], d: float = 0.85, max_iter: int = 100, eps: float = 1e-6) -> List[float]:
# Build incoming links map for efficient lookup
incoming = [[] for _ in range(N)]
out_count = [len(links[i]) for i in range(N)]
for i in range(N):
for neighbor in links[i]:
incoming[neighbor].append(i)
# Initialize PageRank
pr = [1.0 / N for _ in range(N)]
for iter in range(max_iter):
new_pr = [0.0 for _ in range(N)]
# Calculate dangling nodes contribution
dangling_sum = sum(pr[i] for i in range(N) if out_count[i] == 0)
# Calculate new PageRank for each node
for j in range(N):
# Teleportation factor
new_pr[j] = (1.0 - d) / N
# Add dangling nodes contribution
new_pr[j] += d * dangling_sum / N
# Add contributions from incoming links
for i in incoming[j]:
if out_count[i] > 0:
new_pr[j] += d * pr[i] / out_count[i]
# Check convergence
diff = sum(abs(new_pr[i] - pr[i]) for i in range(N))
pr = new_pr
if diff < eps:
break
return pr
Complexity
- ⏰ Time complexity:
O(k * (N + E)), where k is the number of iterations, N is the number of nodes, and E is the number of edges. More efficient than matrix approach for sparse graphs - 🧺 Space complexity:
O(N + E), for storing incoming links and PageRank vectors
Notes
- Method 1 uses the classic matrix-based power iteration approach, suitable for theoretical understanding
- Method 2 provides a more memory-efficient implementation, better for large sparse graphs
- The damping factor d typically ranges from 0.8 to 0.9, with 0.85 being most common
- Convergence usually occurs within 50-100 iterations for most real-world graphs
- Dangling nodes (pages with no outgoing links) require special handling to maintain probability conservation
- The algorithm finds the stationary distribution of a Markov chain representing random web surfing
- PageRank values represent the long-term probability of being at each page during random navigation
- Both implementations handle edge cases like isolated nodes, self-loops, and disconnected components