▶ Interactive Lab

FlashAttention Tiling

Tile attention block-by-block; keep working set in SRAM.

Advertisement
Q tile × K tile fits in SRAM. Online softmax across tiles. No full N×N matrix.

What you're seeing

Same math; never materialize full attention matrix. 5-10× faster on long context.

★ KEY TAKEAWAY
FlashAttention computes attention block-by-block in SRAM, never materializing the N×N matrix. Same math; 5-10× faster on long context.
▶ WHAT TO TRY
  • Click Step to advance through Q-tile, K-tile pairs.
  • Online softmax (running max + sum) gives exact result without ever storing the full matrix.