Sentence-by-sentence
Split output into claims. For each, retrieve supporting sentences from context. NLI classifier: entails / contradicts / neutral.
Advertisement
Attribute claims
Model outputs '[S1]' citation tags. Post-hoc: verify each S1 span actually supports labeled claim. Reject unsupported.
Advertisement
Recompute rate
Real systems find 5-15% claims unsupported even with RAG. Feed back for re-prompt with 'these claims lack support' notes.
Classifier choice
Cross-encoder NLI (DeBERTa) fast + accurate. LLM judge more nuanced but expensive. Tune per accuracy budget.