Cassandra's write path is append-only: every write becomes an SSTable. Compaction merges SSTables to reclaim space and speed reads. Picking the wrong strategy for a workload is one of the most common causes of production pain — disk fills up, reads slow down, or compaction can never catch up.
STCS — Size-Tiered (default)
Buckets SSTables by size; merges when there are N similar-sized ones. Cheap CPU. Doubles disk usage temporarily during compaction. Good for write-heavy workloads with little overwrite. Bad for read-heavy: an old row can live in many SSTables, so reads touch many files.
LCS — Leveled
Organizes SSTables into levels; each level holds non-overlapping sorted ranges. Read amplification stays low (~level-count files per read). High CPU and write amplification (~10x extra writes). Good for read-heavy workloads with frequent updates.
TWCS — TimeWindow
Designed for time-series. Each window (e.g., 1 day) gets its own bucket; old buckets never compact again. Combined with TTL, expired data drops as whole SSTables — no tombstone scanning. The right choice for telemetry, logs, IoT.
Tombstone trap
Deleted rows leave tombstones; reads must scan them until compaction cleans them up. STCS + heavy deletes = read latency explosion. TWCS with TTLs avoids this. If you're stuck on STCS with deletes, watch tombstone_warn_threshold and consider single-row deletes instead of range deletes.
Diagnosing
nodetool tablestats shows SSTable count per table and avg live cells per read. >20 SSTables per partition = compaction is behind. >10x ratio of tombstones to live = you have a tombstone problem.