How

Compute set of tokens legal at current state (per grammar). Set logits of others to -∞. Sample from remainder.

Advertisement

Libraries

Outlines, guidance, LMQL, llama.cpp grammar mode, vLLM guided decoding. All expose regex/JSON Schema/CFG.

Advertisement

Speed

Cheap: grammar step is negligible per token. But: grammar can force model into low-quality tokens (mode collapse).

Trade-off

Very tight grammar = guaranteed format but stunted content. Loose grammar = both quality + validity.