The Core Components

An LSM Tree architecture generally consists of three main components:

  • MemTable: An in-memory mutable structure (often a sorted map/tree). All writes go here first.
  • Commit Log (WAL): A sequential on-disk log to ensure data durability in case of a crash before the MemTable is flushed.
  • SSTables (Sorted String Tables): Immutable, on-disk files created when a MemTable is flushed. They are sorted by key.
Advertisement

The Write Path

  1. Log: The data is appended to the Commit Log for durability.
  2. Memory: The data is inserted into the MemTable. This is fast because it's an in-memory operation.
  3. Flush: When the MemTable fills up, it is "flushed" to disk as a new SSTable. This write is sequential, which is extremely efficient for spinning disks and SSDs alike.
Advertisement

Visualizing the Flush

The following animation demonstrates the transition of data from the active MemTable to an immutable SSTable upon reaching capacity.

The Read Path & Compaction

Because data can exist in the MemTable or any of the many SSTables, reads can be slower than writes. To mitigate this, Cassandra uses:

  • Bloom Filters: To quickly check if a key *might* exist in an SSTable.
  • Compaction: A background process that merges multiple SSTables into larger ones, removing deleted (tombstoned) data and duplicates.