Skip-gram
Given word, predict surrounding words. Softmax over vocabulary is expensive → negative sampling or hierarchical softmax.
Advertisement
CBOW
Given context, predict center word. Faster training but slightly worse for rare words.
Advertisement
Analogies
king - man + woman ≈ queen. Linear structure emerges from co-occurrence patterns.
GloVe
Alternative: factorize co-occurrence matrix. Different training, similar quality embeddings.