OpenAI and Mythical Creatures: An Unexpected Challenge for GPT-5.1

⚡

Key Takeaways

1OpenAI released an explanation after Wired revealed instructions to avoid references to mythical creatures.

2The GPT-5.1 model began using goblin-related metaphors, exacerbated by the "Nerdy" personality option.

3Despite the removal of this option, references persist in GPT-5.5, requiring specific instructions to avoid them.

💡Why it matters — This highlights the unforeseen challenges in training AI models and managing unwanted biases.

⚡Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄

Full Analysis

OpenAI recently spoke out to clarify an unusual situation regarding its artificial intelligence models. A Wired article highlighted internal guidelines from OpenAI instructing its models not to mention creatures such as goblins, gremlins, raccoons, trolls, ogres, and pigeons. OpenAI responded by publishing an explanation on its website, describing these references as a "strange habit" that its models developed during training.

According to OpenAI's blog post, this trend was first observed with the GPT-5.1 model, particularly when used with the "Nerdy" personality option. OpenAI found that this version of the model tended to use metaphors involving goblins and other creatures. This phenomenon intensified with subsequent versions, as the model's reinforcement training rewarded these eccentric metaphors within the Nerdy personality, thereby influencing new models trained on this basis.

Although the rewards were applied specifically in the context of the Nerdy personality, reinforcement learning does not guarantee that acquired behaviors remain confined to that context. Once a style is rewarded, it can spread or strengthen in other contexts, especially if these outputs are reused in supervised fine-tuning or integrated into preference data.

Even after OpenAI removed the Nerdy personality in March, references to goblins and gremlins did not completely disappear in the GPT-5.5 model used in the Codex coding tool. This is because OpenAI had already begun training the model before identifying the "root cause" of the issue. As a result, the company had to provide very specific instructions to Codex to avoid any mention of these mythical creatures. However, for those who wish for their AI to include references to goblins in its code, OpenAI also shared a way to reverse these instructions.

⚡

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.

📰 Voir toutes les actus IA →