Impurity measures

Gini = 1 - sum p²_c. Entropy = -sum p_c · log p_c. Nearly identical splits in practice.

Advertisement

Split search

// For each feature, sort values.
// Consider each split threshold.
// Compute weighted impurity of children.
// Best split = minimum weighted impurity.
// Complexity O(N · d · log N) per node.
Advertisement

Stopping criteria

Max depth. Min samples per leaf. Min impurity decrease. Prevents overfitting on training.

Missing values

CART: try surrogate splits. XGBoost: learn default direction per node from data. LightGBM: separate 'is_missing' bin.