Formula
P(c|x) ∝ P(c) · ∏ P(x_i | c). Independence assumption 'naive' but often good enough.
Advertisement
Multinomial NB
Word counts. P(word | class) = (count(word, class) + α) / (total_words(class) + α·V). Laplace smoothing.
Advertisement
Gaussian NB
Continuous features. Assume P(x_i | c) ~ Gaussian per class. Fit mean + variance per feature.
Complexity
Train: O(N · d). Predict: O(K · d). Blazing fast. Scales to millions of documents.