Gradient Clipping in Action

Advertisement

max_norm 1.0

Most steps: norm < max_norm, no clipping. Occasional spike → clip to max_norm.

Plot of gradient norm per step. Clip when above max_norm (red). Below: pass-through (green).

★ KEY TAKEAWAY

Gradient norm clipping caps spike-induced blow-ups. max_norm=1 is the standard for LLM training.

▶ WHAT TO TRY

Slide max_norm low — see lots of clipping (red bars truncated).
Set it very high — spikes get through and would derail training.
Click Simulate to generate a new sequence of gradient norms with rare spikes.