How
Compute set of tokens legal at current state (per grammar). Set logits of others to -∞. Sample from remainder.
Advertisement
Libraries
Outlines, guidance, LMQL, llama.cpp grammar mode, vLLM guided decoding. All expose regex/JSON Schema/CFG.
Advertisement
Speed
Cheap: grammar step is negligible per token. But: grammar can force model into low-quality tokens (mode collapse).
Trade-off
Very tight grammar = guaranteed format but stunted content. Loose grammar = both quality + validity.