Sentence-by-sentence

Split output into claims. For each, retrieve supporting sentences from context. NLI classifier: entails / contradicts / neutral.

Advertisement

Attribute claims

Model outputs '[S1]' citation tags. Post-hoc: verify each S1 span actually supports labeled claim. Reject unsupported.

Advertisement

Recompute rate

Real systems find 5-15% claims unsupported even with RAG. Feed back for re-prompt with 'these claims lack support' notes.

Classifier choice

Cross-encoder NLI (DeBERTa) fast + accurate. LLM judge more nuanced but expensive. Tune per accuracy budget.