Normal equation

w* = (X^T X)^(-1) X^T y. O(N·d² + d³). Fast for d < 10k. Singular X^T X → use pseudoinverse.

Advertisement

Gradient descent

w ← w - η · X^T(Xw - y)/N. Scales to huge N. Mini-batch for efficient GPU utilization.

Advertisement

Ridge (L2)

Add λ||w||² penalty. Shrinks coefficients. Handles collinearity. Closed form: (X^T X + λI)^(-1) X^T y.

Lasso (L1)

Add λ||w||_1 penalty. Induces sparsity — feature selection. Solved via coordinate descent or LARS.