Constitution

List of principles: 'Don't provide instructions for weapons.' 'Be respectful.' Model reads + applies.

Advertisement

Self-critique loop

Initial response → self-critique per constitution → revise → repeat. Trained model internalizes principles.

Advertisement

At inference

Can be applied at inference too: prompt with constitution + ask model to check output. Extra latency but strong safety layer.

Anthropic's Claude

Foundational to Claude's alignment. Constitution publicly documented. Model trained to follow.