When your dataset exceeds a single node's capacity, sharding splits it across N nodes. Three strategies: hash-based, range-based, directory-based. Each has predictable failure modes — knowing them up front saves a re-shard.
Hash-based
hash(key) % N → shard ID. Even distribution by default; no hot shards. Adding a node remaps ~all keys (consistent hashing mitigates). Range queries impossible (adjacent keys land on random shards). Right for: KV access patterns.
Range-based
Split key space into ranges; each range to a shard. Range queries are efficient (one shard or contiguous shards). Risk: hot ranges (timestamps go to the newest shard). Need re-balancing — usually automatic in distributed DBs.
Directory-based
Lookup service maps key → shard. Flexible (can move individual keys), expressive (queries can filter on the directory first). Operational overhead (extra service to run, cache, scale). MongoDB's config servers, CockroachDB's range descriptors work this way.
Hot shard mitigation
Hash and range both produce hot shards eventually. Solutions: composite key (hash(user_id) for K-V, range(timestamp) for time queries), shard splitting (split the hot range), random suffix to spread within a hot prefix.
Re-sharding
Adding capacity requires moving data. Online re-sharding (no downtime): dual-write to old and new, backfill, switch reads, drop old. Hours to days for TBs. Always plan re-sharding before you need it — surprise is the worst time.