Agentic Misalignment: How LLMs could be insider threats

Jun 21, 2025

—

Additionally, our artificial prompts put a large number of important pieces of information right next to each other. This might have made the behavioral possibilities unusually salient to the model. It may also have created a “Chekhov’s gun” effect, where the model may have been naturally inclined to make use of all the information that it was provided. […]

This research also shows why developers and users of AI applications should be aware of the risks of giving models both large amounts of information and also the power to take important, unmonitored actions in the real world.
— Read on simonwillison.net/2025/Jun/20/agentic-misalignment/

Agentic Misalignment: How LLMs could be insider threats

Comments

Leave a Reply Cancel reply