Advertisement
Most steps: norm < max_norm, no clipping. Occasional spike → clip to max_norm.
What you're seeing
Plot of gradient norm per step. Clip when above max_norm (red). Below: pass-through (green).
★ KEY TAKEAWAY
Gradient norm clipping caps spike-induced blow-ups. max_norm=1 is the standard for LLM training.
▶ WHAT TO TRY
- Slide max_norm low — see lots of clipping (red bars truncated).
- Set it very high — spikes get through and would derail training.
- Click Simulate to generate a new sequence of gradient norms with rare spikes.