Highlights from the Claude 4 system prompt

Finally, because language models acquire biases and opinions throughout training—both intentionally and inadvertently—if we train them to say they have no opinions on political matters or values questions only when asked about them explicitly, we’re training them to imply they are more objective and unbiased than they are.

We want people to know that they’re interacting with a language model and not a person. But we also want them to know they’re interacting with an imperfect entity with its own biases and with a disposition towards some opinions more than others. Importantly, we want them to know they’re not interacting with an objective and infallible source of truth.

Anthropic’s argument here is that giving people the impression that a model is unbiased and objective is itself harmful, because those things are not true!
— Read on simonwillison.net/2025/May/25/claude-4-system-prompt/


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *