Cassandra has secondary indexes. They feel like the SQL indexes you know. They aren't. Used wrong, they make read latency unpredictable and operations a nightmare. Here's when they're OK, when they're a disaster, and what to do instead.

Advertisement

How they actually work

A secondary index is a hidden Cassandra table mapping indexed-value → list of partition keys holding that value. Every read scatters across ALL nodes (the index is local to each node). Latency = slowest-node latency, not coordinator latency.

When they're acceptable

Low cardinality (10-1000 distinct values), small per-value cardinality (small list of matching rows), and you query the index inside a partition (CL filtering by partition key + secondary). Otherwise: don't.

Advertisement

When they bite

High-cardinality columns (email, UUID): the index is huge and useless. Rapidly-changing values: tombstone storms in the index. Cross-cluster scans: latency = slowest node, sometimes minutes.

SAI (Storage-Attached Index) — newer, better

Cassandra 5.0+ added SAI: indexes attached to SSTables, more like real B-tree indexes. Better cardinality story, supports OR, numerics, vectors. Use SAI over legacy 2i when on 5.x.

The denormalize-instead alternative

Maintain a second table keyed by the value you'd index. Write to both tables atomically with batch (logged). Reads against the second table are fast single-partition operations. More disk, predictable latency.

Legacy secondary indexes: only for low-cardinality, partition-bounded queries. Otherwise denormalize. On 5.0+, prefer SAI.