Cassandra for Time-Series — Belgavi.AI Lab

Time-series — metrics, IoT, clickstream, logs — is Cassandra's sweet spot. Write-heavy, sequential, append-only, often TTL'd. Modeled correctly the cluster runs for years; modeled wrong, partitions blow up and compaction can't keep up.

Advertisement

Time-bucketed partition key

Single partition per device gets unboundedly large. Instead: (device_id, day) as PK and clustering by (timestamp DESC). Each partition stays bounded. Reads across days fan out to multiple partitions — usually acceptable.

TTL on insert

Set TTL at write time: INSERT ... USING TTL 2592000 (30 days). Combined with TWCS, expired data drops as whole SSTables — no tombstone scan. This is the magic combo for time-series.

Advertisement

TWCS configuration

Window unit = bucket period (e.g., 1 day). Goal: most reads hit a small number of windows; old windows compact once and stay. Match window to your TTL: 30-day TTL + 1-day windows = ~30 buckets ever active per table.

Pre-aggregate hot reads

If the hot read is 'last 24h avg per device', write a parallel rollup table with one row per (device, hour). Reads are 24 single-row hits instead of scanning thousands of points. Stream aggregator (Spark, Flink) or app-side rollup.

Avoid common traps

Don't use timestamp as part of PK without bucketing. Don't query past gc_grace without TTL alignment (tombstone scan). Don't run STCS on time-series — old data never frees. Don't issue range queries without WHERE on partition key.

PK = (entity, time_bucket), clustering DESC by time, TTL on insert, TWCS. The recipe is the same every time.