Normal equation
w* = (X^T X)^(-1) X^T y. O(N·d² + d³). Fast for d < 10k. Singular X^T X → use pseudoinverse.
Advertisement
Gradient descent
w ← w - η · X^T(Xw - y)/N. Scales to huge N. Mini-batch for efficient GPU utilization.
Advertisement
Ridge (L2)
Add λ||w||² penalty. Shrinks coefficients. Handles collinearity. Closed form: (X^T X + λI)^(-1) X^T y.
Lasso (L1)
Add λ||w||_1 penalty. Induces sparsity — feature selection. Solved via coordinate descent or LARS.