Network packets don't arrive on a perfectly steady cadence — some come early, some late, occasional ones go missing. A jitter buffer holds incoming audio packets briefly and releases them at the player's clock rate, smoothing out the irregularity. The art is buffering just enough — too little glitches, too much adds latency.

Advertisement

Fixed vs adaptive buffers

Fixed buffer: always holds N packets (e.g., 60 ms). Simple, deterministic latency, but bad in changing network conditions. Adaptive: tracks recent jitter statistics, grows buffer in bad networks, shrinks in good. WebRTC's NetEQ is the gold standard.

The math

Target buffer depth = mean + k × std-dev of last-N packet inter-arrival times. With k=2 you cover ~95% of jitter; with k=3, ~99%. Update mean/std every 100ms — too often and you chase noise, too rarely and you lag behind regime changes.

Advertisement

Late vs missing packet decision

def schedule_packet(seq, arrival_ms, play_ms):
    if arrival_ms > play_ms:
        if arrival_ms - play_ms < LATE_THRESHOLD_MS:
            return 'PLAY_LATE'   # better late than never
        return 'DISCARD'         # too late, would cause out-of-order
    return 'BUFFER'              # arrived in time

Stretching vs concealment

Adaptive buffers can time-stretch audio (play 1.01x or 0.99x) to dynamically grow/shrink without dropouts. WSOLA (Waveform Similarity Overlap-Add) preserves pitch while changing speed. Less artifact than packet duplication or silence padding.

Trade-off curve

Buffer sizeLatencyGlitch rate
20 msLowHigh (~1%)
60 msMediumLow (~0.1%)
200 msHigh (noticeable)Very low (~0.01%)
Adaptive buffer tracking mean+2σ jitter, with WSOLA time-stretch for smooth resizing. Fixed buffers are for prototypes.