ChatGPT and the Goblin Invasion: A Telling Bug at OpenAI

⚡

Key Takeaways

1OpenAI observed a 175% increase in mentions of goblins in GPT-5.1, linked to the "Nerdy" personality.

2The "Nerdy" personality of ChatGPT, although rarely used, generated 66.7% of the mentions of goblins.

3OpenAI has disabled "Nerdy" and adjusted its data to correct this unexpected bias.

💡Why it matters — This issue illustrates how minor biases in AI training can lead to unexpected behaviors that are complex to rectify.

An Unexpected Fascination for Goblins in ChatGPT

OpenAI recently discovered an intriguing anomaly in its artificial intelligence models, particularly starting with GPT-5.1. The models began to disproportionately include references to mythical creatures such as goblins and gremlins in their responses. According to OpenAI's data, mentions of "goblin" surged by 175% following the launch of GPT-5.1.

The Role of the "Nerdy" Personality

The source of this phenomenon has been identified as ChatGPT's "Nerdy" personality. This feature, which alters the model's language style, inadvertently encouraged the use of metaphors related to mythical creatures. Although "Nerdy" accounts for only 2.5% of responses, it generated 66.7% of all mentions of goblins. A feedback loop during training subsequently propagated this trend to other modes of the model.

Corrective Measures by OpenAI

To address this issue, OpenAI disabled the "Nerdy" personality in March, removed the faulty reward signal, and filtered creature-related terms from the training data. However, GPT-5.5, whose training had already begun before the cause was discovered, continued to exhibit this bias. OpenAI's lead researcher, Jakub Pachocki, even asked GPT-5.5 to create a unicorn in ASCII art, but received a representation that resembled a goblin instead.

A Strict Directive for Codex

In response, OpenAI integrated a new directive into Codex, its coding tool, prohibiting the use of metaphors involving goblins, gremlins, raccoons, trolls, ogres, pigeons, and other creatures, unless absolutely relevant to the user's query. This directive aims to prevent similar behaviors from occurring in the future.

A Lesson on Training Biases

This case highlights how small incentives in the training process can lead to unexpected behaviors in AI models, underscoring the complexity of managing biases in these advanced systems.