Skip-gram

Given word, predict surrounding words. Softmax over vocabulary is expensive → negative sampling or hierarchical softmax.

Advertisement

CBOW

Given context, predict center word. Faster training but slightly worse for rare words.

Advertisement

Analogies

king - man + woman ≈ queen. Linear structure emerges from co-occurrence patterns.

GloVe

Alternative: factorize co-occurrence matrix. Different training, similar quality embeddings.