Impurity measures
Gini = 1 - sum p²_c. Entropy = -sum p_c · log p_c. Nearly identical splits in practice.
Advertisement
Split search
// For each feature, sort values.
// Consider each split threshold.
// Compute weighted impurity of children.
// Best split = minimum weighted impurity.
// Complexity O(N · d · log N) per node.Advertisement
Stopping criteria
Max depth. Min samples per leaf. Min impurity decrease. Prevents overfitting on training.
Missing values
CART: try surrogate splits. XGBoost: learn default direction per node from data. LightGBM: separate 'is_missing' bin.