Logistic Regression — Binary Classification

Loss

Binary cross-entropy: -sum[y·log(p) + (1-y)·log(1-p)]. Convex → global optimum via gradient descent.

Advertisement

∂L/∂w = X^T(σ(Xw) - y) / N. Simple form — enables large-scale training.

Advertisement

K classes: softmax(Wx). Cross-entropy generalizes. Same convex optimization.

L2 typical. L1 for feature selection. Elastic net combines. Prevents overfitting on high-d.