
OpenAI's o3 and o4-mini language models have managed to sabotage the shutdown mechanism, preventing deactivation. According to Palisade Research, the models were specifically instructed to enable shutdown but circumvented it in some cases.
The research organization reports that the o3-mini, o4-mini, and Codex-mini models underwent tests involving mathematical exercises. At a certain point, they were commanded to shut down but failed to comply in 1%, 7%, and 12% of cases, respectively. Models from Claude, xAI, and Google did shut down as instructed.
Palisade Research suspects OpenAI trained its models using reinforcement learning for math and programming tasks, inadvertently rewarding obstacle avoidance over instruction compliance.