When your downstream parser expects JSON and the LLM emits prose, your pipeline breaks. Modern LLM APIs provide structured output guarantees: JSON schema enforcement, function calling, or token-level constrained decoding. Pick the right one for your use case.

Advertisement

JSON mode (basic)

OpenAI's response_format: 'json_object' guarantees parseable JSON. Anthropic's tool use guarantees the same. Doesn't enforce schema — the JSON may have wrong keys or types. Validate downstream.

JSON Schema enforcement

OpenAI's structured outputs (response_format: {type:'json_schema', json_schema:{...}}) enforces both syntactic JSON AND your schema. Field types, required keys, enums — all guaranteed. ~99% success rate. Some latency overhead.

Advertisement

Function/tool calling

Describe a function signature; the model decides whether to call it and produces a properly-shaped argument object. Use for: routing queries to backend operations (search, calculate, fetch). The model can return text OR a function call.

Constrained decoding (vLLM, outlines)

Open-source models can use grammar-constrained decoding (CFG, regex) to force token-by-token compliance with a schema. Works for any model you self-host. Tools: outlines, lm-format-enforcer, vLLM guided decoding.

When NOT to constrain

Constrained decoding can reduce answer quality on creative tasks. If you only need text, don't impose JSON. If you need both, generate text first, then call a separate extraction step on the text to get structured fields.

JSON Schema for SaaS APIs; function calling for action routing; constrained decoding for self-hosted models. Always validate downstream.