Constitution
List of principles: 'Don't provide instructions for weapons.' 'Be respectful.' Model reads + applies.
Advertisement
Self-critique loop
Initial response → self-critique per constitution → revise → repeat. Trained model internalizes principles.
Advertisement
At inference
Can be applied at inference too: prompt with constitution + ask model to check output. Extra latency but strong safety layer.
Anthropic's Claude
Foundational to Claude's alignment. Constitution publicly documented. Model trained to follow.