▶ Interactive Lab

Matrix Multiplication Step-Through

Watch Y = X·W computed entry-by-entry.

Advertisement
Y[i,j] = sum_k X[i,k] * W[k,j]. Click Step to compute one entry at a time.

What you're seeing

For shapes [N,d] · [d,m] = [N,m]. Each output entry is a dot product of a row of X with a column of W. Cost: N·m dot products of length d. Total: O(N·m·d) multiply-adds.

BLAS libraries (OpenBLAS, MKL) implement this with cache-blocking, SIMD vectorization, and threading. ~10-100× faster than naive Python loops.

★ KEY TAKEAWAY
Matrix multiplication = a grid of dot products. Each output cell is one row × one column.
▶ WHAT TO TRY
  • Click Step one entry to watch one Y[i,j] get computed at a time.
  • Click Compute all to see the final result instantly.
  • Notice that BLAS libraries do this same math, but ~100× faster via cache blocking + SIMD.