Forty Haiku Agents and a Locked Door

I have a confession. I thought I was in the commodity quadrant. I wasn't. I was in genesis. I burned through a weekend and an embarrassing number of tokens before realising the map I was holding was not the map I needed.

Let me back up.

About a year ago, David Knott , then UK Government Chief Technology Officer, nerd-sniped a bunch of us. Two words: "Club 54". An exclusive members' club for technologists. Solve the riddle of the name, pass whatever unspoken appraisal David gave the cut of your jib, and you were in. No website. No application form. Just the puzzle, the door, and David on the other side.

Two words: Club 54

He wouldn't tell us why "54". Originally it was a conversational riddle: you'd float a theory, and David, depending on his mood and his tea, would say nothing, raise an eyebrow, or tell you you were some degree of wrong. That is how it eventually got solved, and how we should have solved it all along.

But I wasn't content with that. I wanted to brute-force it. So I asked him for a SHA-256 hash of the answer to test guesses programmatically, and to his credit he obliged: c7c8eec878a08fac3fe2ae7fbe5f4c158cbfcae3cb0e95e4ef53937fec6d5f94, with a one-line recipe — lowercase, strip whitespace, pipe into sha256sum.

That request, in retrospect, is the whole story. A conversational riddle has a warm/cold gradient. David's raised eyebrow was the gradient. A SHA-256 hash isn't. I voluntarily swapped the gradient for a cliff and then complained the cliff was steep. David watched. David enjoyed watching.

It took almost a year. We cracked it on his last day in post. I'm not sure whether solving it let him retire in peace, or whether he'd have stayed on out of stubbornness if we hadn't. The bolt came off the door the day the badge came off the lanyard.

From where I stood, this looked solved. Verification was free. The candidate space was finite-ish. I had a hash and a shell. What I needed was throughput. Classic commodity play. You don't think about a commodity, you throw compute at it.

So I did what any reasonable person with an API key would do. I fired up forty parallel Claude Haiku subagents, handed them --permission-mode bypassPermissions like car keys to a teenager, and pointed them at the hash. I fed them the first chapter of DK's book The Transcendental Elephant and a photograph of his GDS laptop: Ada Lovelace sticker, CDDO logo, a butterfly. Generate, hash, log, celebrate when it clicks.

Article content — David's GDS laptop - the sticker set, as photographed

Reader, it did not click.

What I got instead was 355,408 entries in wrong_answers.md, 151,731 unique hashes in seen_hashes.log, and a comprehensive_research_document_COMPLETE.md that was, I am not exaggerating, 48,051 lines long. It claimed 2,179 unique connections between 54 and the history of computing. What it actually contained was a small handful of observations about Turing, Babbage and Lovelace, reshuffled into 2,179 different outfits and marched past the reader as a parade. It called itself "comprehensive" 114 times.

The guesses were a beautiful taxonomy of a model flailing. The numerology: 54 as Harshad, Leyland (3³ + 3³). The chemistry: xenon at atomic number 54. The card-counting: 52 plus two jokers. The networking trivia: port 54, RFC 54. It kept returning, like a tongue to a loose tooth, to 6×9=42 in base 13. And, inevitably, that Chapter 6 of The Transcendental Elephant starts on page 54.

Then, and this is the bit that should make any AI practitioner sit up, it started making things up. "54 gears in Babbage's mill." There are not. "Rotor 54 Enigma." The Enigma had three to eight rotors. I found a cluster of guesses all exactly twenty characters long — "computation fifty fo", "precision mechanical ", "babbage mill fifty f" — because something had set a length ceiling and the model started truncating rather than thinking. Literally, padding.

Here is where I should have stopped and drawn a map.

I didn't. I added more agents. The prize, it turned out, for adding more agents was more agents.

This is the AI-era equivalent of boiling the ocean. The consultant version: "let's do a workshop." The infrastructure version: "let's provision a bigger cluster." The 2026 version: "let's spin up ten more Haiku agents." It feels like progress because something is happening. Tokens flow. Dashboards go up and to the right. You can tell a stakeholder "3,000 attempts an hour" and nobody will interrupt to ask the only question that matters: attempts at what?

Here is what I was refusing to see. I thought I was in the commodity quadrant: large known candidate space, cheap verification, a throughput game. If true, forty Haiku agents would have been gorgeous strategy.

But I wasn't in commodity. I was in genesis. The answer wasn't a known item waiting in some English-language corpus to be enumerated. It was the kind of thing one specific human had thought of once. The candidate space wasn't "the set of facts" but "the set of things that might click for David Knott on a particular afternoon." No distribution to sample from.

And here the cheap thing masked the expensive thing. The SHA check was free; it looked like the whole job. The expensive part, the part nobody priced, was the prior: the distribution the model was drawing from. My forty agents were verifying furiously from a distribution that did not contain the answer. A SHA-256 is a cliff, not a slope. It tells you no. Throwing more compute at a random walk isn't strategy. It's cardio.

Before launching a single agent, I should have written down not the size of the search space but its shape. How would I know if I was getting warmer? For "6" versus xenon, is one closer? The uncomfortable answer is no. Forty random walks are still a random walk.

I should have mapped the edges of the space, but I couldn't, because I was inside it. The LLM was prompted in English; it generated, scored, hallucinated and truncated in English. From inside its prompt, English was the universe. Somebody has since put a cleaner name on this: the Monolingual Cage. The door was on the outside.

I won't reveal the answer, or which language moved the bolt — DK hasn't opened the club yet and it's his punchline not mine. But I'll tell you what cracked it. Not another agent. Not a bigger corpus. A human, over a pint, with three supplementary clues DK had been saving — about forty minutes of not-very-hard thinking once the right door was open.

A human with a map beat forty Haiku agents without one. Not because the human was cleverer. Because the human could ask "is the answer in this box at all?" and the model couldn't — it had never been told there was a box.

So here's the doctrine. Before you map-reduce, map. Before you parallelise, ask what the gradient looks like. Before you throw more agents at a wall, work out whether the wall is on the boundary or you're running the wrong way from the middle. When your verification function is a cliff, no amount of Haiku turns it into a ramp.

A second frame, which belongs to DK as much as me: he set the puzzle; a human solved it; the machine did neither. Setter and solver are the load-bearing roles. The forty agents were equipment. Useful, sometimes. But equipment doesn't know which room it's in.

I'd love to say I worked this out in advance. I didn't. I worked it out after 48,051 lines of model-generated self-congratulation and realising every paragraph was the same paragraph wearing a different hat. The answer wasn't in the data. It wasn't in more data. It wasn't in data at all. It was in a question I hadn't thought to ask: what language am I looking in?

I still don't have a complete map. I'm not sure anyone does. For all the talk of agentic systems and the "six nines" of reliability people now claim, the one thing they cannot do is notice the edge of their own prompt. That is situational awareness, and it is still stubbornly human. I might be wrong; it might be a fluke that my human friends are beating my model friends at this parlour game. But when the verification function is a cliff and the search space in genesis, I'd bet on the pint.

Which brings me to the closing demand. David, you've had your situational awareness. You've had the joke. You've watched me, and a small crowd of other would-be members, spend the best part of a year and an embarrassing amount of Anthropic's inference budget on a puzzle a pint cracked in forty minutes. The club is now a landmark in my personal history of getting things wrong.

In the spirit of distributing the nerd-snipe back to its author, I invite everyone reading this to send David their guess directly, considered as a formal Club 54 membership application. This should please him twice: once because you tried, and once because he gets to reject you personally. He'll also be quietly relieved the candidate pool doesn't include any agentic AI systems, which on current form would not have got past the door.

So, David. Last day is done. Riddle is solved. You have your technologists. Open the club. Set a date. Name the venue. Do it properly, so I can, in the finest Groucho Marx tradition, find out whether I actually want to be a member of a club that would have me.

(Views in this article are my own.)