How Gossip Works
Imagine a cocktail party. You tell a rumor to 3 random friends. In the next round, they tell 3 random friends. Exponentially, the entire party knows the rumor very quickly. Distributed systems use this mechanism to propagate state updates.
Every second (or configurable interval), a node selects a few random peer nodes and exchanges information. They compare their knowledge of the cluster state (using version numbers or vector clocks) and merge the diffs.
Key Attributes
- Peer-to-Peer: No central bottleneck.
- Robust: Can tolerate node failures; the message finds a way.
- Eventually Consistent: It takes (\log N)$ rounds for information to propagate to all $ nodes.
Phi Accrual Failure Detector
Cassandra uses Gossip not just for state but for failure detection. Instead of a binary "up/down", it calculates a probability ($\Phi$) that a node is down based on the history of heartbeat intervals. This adapts to network latency automatically.
Visualization
Below is a simulation of Gossip infection. Click "Start Gossip" to see how a single piece of information spreads through the cluster.