Formula

P(c|x) ∝ P(c) · ∏ P(x_i | c). Independence assumption 'naive' but often good enough.

Advertisement

Multinomial NB

Word counts. P(word | class) = (count(word, class) + α) / (total_words(class) + α·V). Laplace smoothing.

Advertisement

Gaussian NB

Continuous features. Assume P(x_i | c) ~ Gaussian per class. Fit mean + variance per feature.

Complexity

Train: O(N · d). Predict: O(K · d). Blazing fast. Scales to millions of documents.