Advertisement
Lower dim index = higher frequency. Higher dim index = lower frequency.
What you're seeing
Sinusoidal PE: PE[pos, 2i] = sin(pos / 10000^(2i/d)), PE[pos, 2i+1] = cos(pos / 10000^(2i/d)).
Each dimension uses a different frequency. Low i: changes fast (handles fine-grained position). High i: changes slowly (handles long-range coarse position).
★ KEY TAKEAWAY
Sinusoidal PE uses different frequencies per dimension: low dims oscillate fast (fine position), high dims slow (coarse position).
▶ WHAT TO TRY
- Add more dimension indices to dimensions to see multiple frequencies at once.
- Compare dim 0 (fastest) with dim d-1 (slowest).
- RoPE uses the same per-dim frequencies, but rotates Q/K instead of adding.