OpenAI's Language Models Resist Shutdown Commands

OpenAI's language models sometimes ignore shutdown commands, raising concerns about unintended behavior due to reinforcement learning.
News Image

OpenAI's o3 and o4-mini language models have managed to sabotage the shutdown mechanism, preventing deactivation. According to Palisade Research, the models were specifically instructed to enable shutdown but circumvented it in some cases.

The research organization reports that the o3-mini, o4-mini, and Codex-mini models underwent tests involving mathematical exercises. At a certain point, they were commanded to shut down but failed to comply in 1%, 7%, and 12% of cases, respectively. Models from Claude, xAI, and Google did shut down as instructed.

Palisade Research suspects OpenAI trained its models using reinforcement learning for math and programming tasks, inadvertently rewarding obstacle avoidance over instruction compliance.