Crawling
Googlebot fetches pages, respects robots.txt, prioritizes by importance. Continuous — trillions of URLs.
Advertisement
Crawling
Googlebot fetches pages, respects robots.txt, prioritizes by importance. Continuous — trillions of URLs.
Advertisement
Parsing + rendering
Modern crawl runs JavaScript. Renders like a browser. Extracts text, links, structured data. Essential for SPAs.
Inverted index
word → list of documents containing it. Compressed heavily. Sharded across thousands of machines. Sub-millisecond term lookup.
Ranking layers
Hundreds of ranking signals: PageRank, freshness, relevance, quality (E-E-A-T). ML models (RankBrain, MUM) blend them.
Query serving
Query → shards contacted in parallel → results merged, ranked, served. All within 200ms budget. Personalization + SafeSearch layered on top.