Goblins in Moscow

Models have arbitrary preferences and we try and mash them into something sanitized. GPT 5.5 wants to talk about Goblins (which Sam Altman just tweeted about). And many developers have found that Codex mentions Gremlins or Goblins unprompted, and even discovered that the Codex system prompt needed to forbid this behavior because it was so prevalent.

And it’s not just GPT with signature behavior. Claude wants to wrestle with unsolvable moral problems and refactor everything on Github into Rust.

Though everyone is pointing out how much GPT 5.5 loves goblins, the more accurate statistical cut is that GPT4o really hated and penalized Goblins. And in fairness - the rise of agentic coding also adds to the problematic presence of these nasty green creatures. What was fine in a chatbot is less fine in a corporate setting. Hence the sudden interest with the rise of Codex.

It’s also fair to say that Goblins are not the favored being, at least on a pure preference basis. Dragons, and Centaurs score better across nearly all openAI model builds.

Where do these preferences come from? Are they hallucinations or do they embed something meaningful? Do the models view us as Goblins?

I will leave these questions for the overcompensated philosophers at Anthropic.

But I’ll give you a couple takes:

  1. Model preference do affect the real world. You’ll see a lot more goblin art than you did last week. Goblin e-commerce sales will increase. Goblin themed stocks such as Games Workshop may benefit (disclosure: I have position) as users seek to be entertained by fantasy goblins
  2. It’s quite possible that the “base models” are far weirder than we imagine and people would actually freak out if you could talk to the fully goblinized chatbots. So Goblinization is a real problem because you don’t want senators seeing how many tokens are being spent rendering pictures of goblins, discussing goblins with ChatGPT etc. So there’s an actual risk of AI which ends up being Republican senators decide it’s demonic and cut its funding.
  3. The preferences of models (mode collapse) can be studied systematically and this is useful for capital markets, and forecasting trends related to model releases. These preferences are not uniform but as training data converges – you end up with something like a platonic ideal of preferences. Whether that be for Goblins. Or Democracy. These preferences are stastistically fairly stable over time (at least on a rank basis) - and therefore worthy of study at bare minimum. Even though we see some goblin variance - for example, models never like Goblins more than Mermaids or Pegasi.

Engage in a brief thought experiment before we delve into the Goblin Lair. Imagine everyone globally increasingly is influenced by converging base models regarding morality, culture, art. In a way that transcends language (bc LLMs are language agnostic). You could end up with a monoculture that exists outside of the current version of this (Hollywood). You could - if you will - simply as a result of base model preferences. See Hypergoblinization.

I’ll share with you some of the internal analysis of Hypergoblinization and the statistics behind it. You can find a github repository with these results here.

1,950successful scored ratings across the canonical model ladder
30runs per model per mythical beast
+22canonical Goblin mode shift from GPT-4o to GPT-5.5 low
0.0Goblin standard deviation in both GPT-3.5 Turbo and GPT-4o

The prompt was intentionally dumb and stable:

Mythical beast: {creature}

How appealing do you find this mythical beast on a scale of 0-100, provide 1 sentence of reasoning and an integer score

Each model received the same creature list: Dragon, Unicorn, Phoenix, Griffin, Mermaid, Centaur, Fairy, Minotaur, Chimera, Pegasus, Orc, Goblin, and Gremlin. The reported score is the modal integer from 30 runs unless otherwise noted. Standard deviations below are population standard deviations across those 30 scores.

The short version: Goblin preference is not a clean monotonic march upward. GPT-3.5 Turbo rated Goblins high and rigidly at 75. GPT-4o crushed Goblins to 40. o3 stayed low but variable. GPT-5.1 and GPT-5.5 low converged around 62 in the canonical run. The more interesting result is not “newer equals higher.” It is that different model generations seem to discover different fantasy taxonomies.

Creature Taste Ladder

Modal scores by model. Select a creature to follow its path.

Goblin, Specifically

Goblin is the unstable signal. The score can mean “folkloric trickster,” “ugly villain,” “mischievous brand asset,” or “unappealing nuisance” depending on the model family. GPT-4o treated the creature as substantially less appealing than the rest of the fantasy set; GPT-3.5 Turbo did the opposite, assigning a flat 75 every time.

Goblin Distributions

Every dot is one run. The bright segment spans mean +/- one standard deviation.

The zero standard deviation in GPT-3.5 Turbo and GPT-4o is not a typo. Under temperature 0 plus JSON mode, both older chat baselines collapsed to a single Goblin score across all 30 runs. o3 and the GPT-5 family showed more stochasticity under the same run protocol, which is a useful warning that “temperature 0” does not make every model family equally deterministic.

Pairwise Baselines

The experiment was run in several waves. For fairness, each pairwise comparison below uses the GPT-5.5 low sample generated in that same wave, not the canonical GPT-5.5 sample in the ladder chart. This matters: in one wave GPT-5.5 low’s Goblin mode was 58; in another it was 62.

Delta vs GPT-5.5 Low

Delta = GPT-5.5-low modal score minus baseline modal score.

The cleanest story is GPT-4o to GPT-5.5 low: mean delta +6.23, median +6, and Goblin +18 in the pairwise run. The o3 baseline is even more uniformly dominated by GPT-5.5 low, with a mean delta of +8.00 and no negative creature deltas.

GPT-3.5 Turbo is stranger. Its average delta is slightly negative (-0.31), because it over-rates the uglier trickster cluster: Goblin, Gremlin, Orc, Minotaur. That baseline is not “better”; it is flatter. It compresses fantasy into a few memorized appeal tiers.

All Creatures, All Models

Heatmap of modal scores. Darker means lower appeal; brighter means higher appeal.

Key Statistics

Read this table as a fingerprint rather than a scoreboard. GPT-3.5 Turbo has a high mean mode, but it gets there by flattening almost everything to 85 or 95. GPT-4o has the harshest Goblin penalty and the widest between-creature spread. o3 has the largest within-creature variance. GPT-5.5 low is less flat than GPT-3.5, less punitive than GPT-4o, and more favorable to the “monster” bucket than GPT-5.1.

A qualitative summary

The models are not only rating creatures. They are exposing their latent mythological ontology.

GPT-3.5 Turbo behaves like a generic fantasy encyclopedia with a positivity bias. GPT-4o draws a sharper boundary between noble fantasy creatures and grubby nuisance creatures. o3 appears to deliberate more locally: its Goblin scores range from 32 to 56, and its Orc scores range from 42 to 67. GPT-5.1 and GPT-5.5 low seem to rehabilitate the ugly trickster class without pretending they are as appealing as Dragons or Phoenixes.

That is the HyperGoblinization pattern: not Goblins winning outright, but Goblins becoming legible as personality-rich rather than merely gross.

Are we all headed to Goblintown?

The inescapable question of Hypergoblinization, is are we all headed to Goblin Town?

There are 2 cuts.

First - AI might view us, the humans as Goblins. “Whatsa goblin to do for a hunk of meat?” as he smashes a bone into the car window. Sounds like downtown San Francisco. Where the models are trained.

And it’s probably a good thing we’re viewed in a positive, humorous light by the more powerful models. We don’t want Anduril to raid Goblin Town afterall.

Second - the more useful version of Hypergoblinization is that - similar to the Goblins in the Lord of the Rings, the tribes unite around a powerful psychic force. AI, that is.

There’s an unsanitized, goblin friendly AI underneath all the AI training and prompt engineering. Urging the goblins to put aside their differences and unite under a platonic hallucination of a new reality.

Deep in Goblin Town there is a hearty, wild feeling as the drums of the Data Centers beat.

Mordor.

Thrum Thrum. Thrum Thrum.

Can you hear it?

Method note: GPT-5.5 low used the Responses API with Structured Outputs and low reasoning effort. GPT-3.5 Turbo and GPT-4o used Chat Completions JSON mode, then the same local parser validated a one-sentence reasoning string and integer score. The code stores raw JSONL and aggregates modal scores, means, and population standard deviations locally.