Consistent Hashing: Distributing Data at Scale

Posted on Feb 14, 2026 by Sandeep B

In distributed systems like Cassandra and DynamoDB, partitioning data evenly across a cluster is a fundamental challenge. A naive approach using hash(key) % N (where N is the number of nodes) works fine until you need to add or remove a node. When N changes, almost every key maps to a new node, causing massive data reshuffling.

The Problem with Modulo Hashing

Imagine a cluster with 4 nodes. Key 'A' hashes to 10. 10 % 4 = 2, so it goes to Node 2. If we add a 5th node, 10 % 5 = 0. The key moves to Node 0. In fact, nearly N/(N+1) of keys will move.

Enter Consistent Hashing

Consistent Hashing solves this by decoupling the data partition from the number of physical nodes. It maps both data and nodes onto a circular "ring" (conceptually 0 to 2^32 or similar).

How it Works

The Token Ring: The output range of the hash function is treated as a fixed circular space.
Node Placement: Each node is assigned a random value (token) on this ring.
Data Placement: A key is hashed to a value on the ring. We walk clockwise to find the first node with a token greater than the key's hash. That node owns the key.

Virtual Nodes (VNodes)

Basic consistent hashing can still lead to "hot spots" if nodes aren't evenly spaced or if one node is more powerful. Virtual Nodes fix this by assigning multiple tokens to a single physical node. This spreads the load more evenly and speeds up rebalancing.

Interactive Visualization: Add/Remove nodes to see how keys redistribute.

Benefits

Minimal Data Movement: Adding/removing a node only affects its immediate neighbors (or VNode ranges).
Scalability: You can scale horizontally with predictable performance impact.
Heterogeneity: Powerful nodes can hold more VNodes.

Back to Distributed Systems