Algorithm

// Initialize k centroids (random or k-means++)
// Repeat until convergence:
//   Assignment: each point → nearest centroid
//   Update: centroid = mean of assigned points
Advertisement

k-means++

Better initialization. First centroid random. Each subsequent centroid: prob ∝ D(x)² where D = distance to nearest chosen centroid.

Advertisement

Convergence

Monotonic decrease of objective. Always converges. But to local minimum — restart multiple times.

Choosing k

Elbow method (plot inertia vs k). Silhouette score. Gap statistic. Domain knowledge best.