LLMs hallucinate. No prompt eliminates it; structural defenses do reduce it 5-10x. The 2026 stack combines retrieval grounding, output constraints, and explicit uncertainty — used together, not individually.
Retrieval grounding
Force model to cite retrieved sources. 'Answer only based on the documents below. If documents don't contain the answer, say I don't know.' Strong reduction in factual hallucination — when retrieval is good.
Constrained outputs
JSON schemas force structure. Enums limit values. Length limits prevent runaway generation. Tools like outlines, lm-format-enforcer enforce at decode time — eliminates structurally-impossible hallucinations.
Self-consistency / verification
Generate N answers; if they agree, high confidence. If they disagree, low confidence — escalate or surface uncertainty. Expensive (N× cost) but valuable for high-stakes domains.