▶ Interactive Lab

Weight Initialization Distributions

Xavier, Kaiming, normal — visualized.

Advertisement
Right init keeps variance ~1 through the network.

What you're seeing

Zero init kills training. Right init: var(W) controlled by fan_in.

★ KEY TAKEAWAY
Right init keeps variance ~1 through the network. Xavier for tanh, Kaiming for ReLU, N(0, 0.02) for transformers.
▶ WHAT TO TRY
  • Try Zero init — broken (all activations zero, no training).
  • Try Uniform — way too wide, training unstable.
  • Try Normal(0, 0.02) — the transformer default.