Algorithms / DataExhibited

PAGERANKER

Web Graph Crawler and PageRank Authority Analyser

PageRank Algorithm · Web Crawler · Graph Analysis · Link Authority

The Problem Statement

Understanding web page authority requires both a crawler to discover the link graph and a ranking algorithm to compute relative importance — most academic PageRank implementations provide the algorithm without the crawler, or the crawler without ranking. PageRanker combines both into a single pipeline: it crawls a seed URL up to a configurable depth, builds the directed link graph, applies the iterative PageRank algorithm until convergence, and surfaces the top-authority pages. It operates as the analytical middle ground between a raw link extractor and a full search engine index, making link authority visible and explorable.

The Architecture Layout

The crawler module uses Python requests with async queue management to fetch HTML pages concurrently, extracting all anchor href attributes and normalising relative URLs to absolute form. The graph builder constructs a directed adjacency list as pages are discovered, with nodes representing URLs and directed edges representing hyperlinks. Once crawling completes to the configured depth, the PageRank engine initialises each node with equal weight (1/N) and iterates the PageRank formula — distributing each page's score among its outbound links with a damping factor of 0.85 — until delta between iterations falls below convergence threshold. Results are sorted by final score and displayed as a ranked table with inbound link counts and domain statistics.

Architecture Design Diagram

PAGERANKER System Architecture

System Process Flow

User Actions──>Interface State──>Core Processing Logic
Database Persistent Layer<── Response Loop ──>Dynamic UI Updates