Consistent Hashing: Distributing Data at Scale

Posted on Feb 14, 2026 by Sandeep B

In distributed systems like Cassandra and DynamoDB, partitioning data evenly across a cluster is a fundamental challenge. A naive approach using hash(key) % N (where N is the number of nodes) works fine until you need to add or remove a node. When N changes, almost every key maps to a new node, causing massive data reshuffling.

The Problem with Modulo Hashing

Imagine a cluster with 4 nodes. Key 'A' hashes to 10. 10 % 4 = 2, so it goes to Node 2. If we add a 5th node, 10 % 5 = 0. The key moves to Node 0. In fact, nearly N/(N+1) of keys will move.

Enter Consistent Hashing

Consistent Hashing solves this by decoupling the data partition from the number of physical nodes. It maps both data and nodes onto a circular "ring" (conceptually 0 to 2^32 or similar).

How it Works

Virtual Nodes (VNodes)

Basic consistent hashing can still lead to "hot spots" if nodes aren't evenly spaced or if one node is more powerful. Virtual Nodes fix this by assigning multiple tokens to a single physical node. This spreads the load more evenly and speeds up rebalancing.

Interactive Visualization: Add/Remove nodes to see how keys redistribute.

Benefits

Back to Distributed Systems