Any real system needs to rebalance. Adding capacity. Splitting hotspots. Retiring failed shards. Done poorly = downtime. Done well = invisible.
Advertisement
Dual-write window
Start writing to both old + new shard. Reads still from old. Zero data loss.
Dual-write window
Start writing to both old + new shard. Reads still from old. Zero data loss.
Advertisement
Backfill in background
Copy old data → new shard while dual-writes continue. Rate-limit to avoid impact.
Verify + flip
Compare row counts, sample data. When confident, atomically flip metadata so reads go to new shard.
Read-new + write-both
Interim: reads from new. Writes still to both (in case rollback needed). Monitor.
Cleanup
Once confident, stop writing to old. Delete rows moved. Reclaim capacity.
Dual-write → backfill → verify → cutover → cleanup. Zero downtime, roll-back-able.