The parallels with phishing and social engineering by scammers against human targets are clear.
Ability to apply the safety specifications to specific instances. For the adversarial setting, the crucial aspect is the ability of the model to apply the safety specifications to instances that are out of the training distribution, since naturally these would be the prompts provided by the adversary,
— Read on simonwillison.net/2025/Jan/22/trading-inference-time-compute/
Leave a Reply