Opus is a royalty-free audio codec used by WebRTC, Discord, and most modern VoIP systems. Its superpower: a single codec spans low-bitrate speech (~6 kbps) to high-fidelity stereo music (~510 kbps) and can switch on the fly within a single stream.
Why a single codec for both?
Opus combines SILK (Skype's speech codec) for low frequencies and CELT (transform codec) for high frequencies. A scheduler inside the encoder picks which one to use per frame — speech-heavy regions use SILK, music regions use CELT, complex content uses both in a hybrid.
Frame sizes vs latency
| Frame size | Latency | Use case |
|---|---|---|
| 2.5 ms | Ultra-low | Live music sync |
| 5 ms | Very low | Game voice |
| 10 ms | Low | WebRTC default |
| 20 ms | Standard | VoIP |
| 40-60 ms | Higher | Storage/streaming |
Encoder config for voice
OpusEncoder *enc = opus_encoder_create(48000, 1, OPUS_APPLICATION_VOIP, &err);
opus_encoder_ctl(enc, OPUS_SET_BITRATE(24000));
opus_encoder_ctl(enc, OPUS_SET_VBR(1)); // variable bitrate
opus_encoder_ctl(enc, OPUS_SET_DTX(1)); // discontinuous tx in silence
opus_encoder_ctl(enc, OPUS_SET_PACKET_LOSS_PERC(10)); // robustness for lossy nets
opus_encoder_ctl(enc, OPUS_SET_FEC(1)); // forward error correctionForward Error Correction (FEC)
Opus can encode a low-bitrate copy of the previous frame inside the current frame's payload. If a frame is lost, the decoder reconstructs it from the next frame's FEC data — adds ~5-10% bitrate, halves the perceptual impact of packet loss.
DTX — silence supression
When the user isn't talking, Opus emits 1-byte 'comfort noise' packets instead of full frames. Saves bandwidth (~70% reduction on average voice stream) and battery on mobile.