Support Vector Machines + Kernel Trick

Primal / dual

Primal: min ½||w||² + C·hinge_loss. Dual: quadratic programming in α. Support vectors have α > 0.

Advertisement

Linear. Polynomial (x·y + c)^d. RBF exp(-γ||x-y||²). Sigmoid tanh(κ x·y + c).

Advertisement

Training O(N²) - O(N³). Prediction O(SV × d). N > 10k → use kernel approximation (Random Features).

C controls trade-off between margin + slack. Small C = large margin, more errors OK.