Primal / dual
Primal: min ½||w||² + C·hinge_loss. Dual: quadratic programming in α. Support vectors have α > 0.
Advertisement
Common kernels
Linear. Polynomial (x·y + c)^d. RBF exp(-γ||x-y||²). Sigmoid tanh(κ x·y + c).
Advertisement
Complexity
Training O(N²) - O(N³). Prediction O(SV × d). N > 10k → use kernel approximation (Random Features).
Regularization
C controls trade-off between margin + slack. Small C = large margin, more errors OK.