A jitter buffer trades latency for smoothness. Static buffers either run dry (audio glitches) or overshoot (lag). Adaptive jitter buffers — what every modern stack ships — adjust target depth to recent jitter statistics.
Target depth from jitter percentile
Sample inter-arrival time over a sliding window. Set target buffer depth = p95 of recent jitter, with floor (10ms) and ceiling (200ms for conversational, 500ms for one-way). Update every 100-500ms.
Adjusting without artifacts
Sudden depth changes cause clicks. Stretch/compress audio gracefully — time-scale modification (WSOLA, PSOLA) preserves pitch while changing duration. Most stacks fade in/out chunks during depth adjustments.
When to drop instead of buffer
Catastrophic delay (>500ms behind): drop the backlog and resync. Better one audible discontinuity than 5 seconds of perpetual lag. Most VoIP stacks trigger this around 800ms-1s behind.