Algorithm
// Initialize k centroids (random or k-means++)
// Repeat until convergence:
// Assignment: each point → nearest centroid
// Update: centroid = mean of assigned pointsAdvertisement
k-means++
Better initialization. First centroid random. Each subsequent centroid: prob ∝ D(x)² where D = distance to nearest chosen centroid.
Advertisement
Convergence
Monotonic decrease of objective. Always converges. But to local minimum — restart multiple times.
Choosing k
Elbow method (plot inertia vs k). Silhouette score. Gap statistic. Domain knowledge best.