Falling Without a Checklist: The Only Migration That Matters

On 30 October 1935, a prototype called the Boeing Model 299 taxied onto the runway at Wright Field, Ohio, and crashed on takeoff. The aircraft was not faulty. The pilot, Major Ployer Peter Hill, was not incompetent. The Model 299 was simply too complex for a single human being to operate from memory. It had four engines where previous bombers had two. It had more flaps, more fuel mixtures, more trim tabs, more ways to kill you if you forgot a step.

The US Army Air Corps nearly cancelled the programme.

Instead, a group of test pilots did something that no amount of individual heroism could have accomplished: they wrote a checklist. Pre-takeoff. Pre-landing. Pre-everything. The checklist was not a training aid for beginners. It was a systemic intervention that acknowledged a simple, uncomfortable truth — the aircraft had exceeded the capacity of human memory, and no amount of skill or courage could substitute for a system.

The Model 299 went on to become the B-17 Flying Fortress. It helped win a war. And it did so not because the pilots got braver, but because they got disciplined.

I keep thinking about that checklist. I keep thinking about it because 62% of organisations that attempt cloud migration report significant unplanned cost overruns, delays, or outright failure. Sixty-two percent. That is not a teething problem. That is a systematic absence of pre-flight procedure. These organisations are climbing into the cockpit of a four-engine bomber and trying to fly it from memory.

Article content
Ground crew preparing the 299 for the takeoff that would change aviation safety

The Sport That Industrialised Courage

The Model 299 was not the first time humans confronted the gap between individual bravery and systemic safety. Skydiving did it first — and did it better than almost any industry I have encountered.

In 1797, André-Jacques Garnerin jumped from a balloon over Paris with a silk canopy and no backup. He survived. In 1919, Leslie Irvin made the first deliberate free-fall jump with a manually deployed parachute. He survived too. What happened next is the part nobody in technology talks about. The skydiving community did not celebrate these individual acts of courage and move on. It did something far more radical. It built a safety stack — a layered system in which each component exists because the one above it might fail.

That stack looks like this. First, training — rigorous, standardised, non-negotiable. You do not get to skip ground school because you are clever. Second, the main parachute — packed according to a procedure, inspected according to a schedule. Third, the reserve parachute — packed by a certified rigger (regulated by the UK Civil Aviation Authority in Britain and the Federal Aviation Administration in the US), not by you, because the person most likely to make an error with your reserve is you. And fourth, the Automatic Activation Device, or AAD — a small computer strapped to the rig that measures altitude and velocity and deploys the reserve if the jumper has not done so by a predetermined altitude.

The AAD does not care about your experience. It does not care about your confidence. It fires when the numbers say fire. It is the final backstop in a system designed around the assumption that every human layer above it might fail.

This is not cowardice. This is the opposite of cowardice. This is what courage looks like when it has been industrialised.

Then Like Now

Here is the part that should make every CTO reading this put down their coffee.

The cloud migration industry in 2026 has no equivalent safety stack. Most organisations have a main parachute — the migration plan itself, the architecture diagrams, the Jira tickets. Some have a reserve — a rollback strategy, tested occasionally, understood by a handful of engineers. Almost none have an AAD. Almost none have a systemic, automated, threshold-triggered mechanism that fires independently of human judgement when the numbers say the migration is going wrong.

And the training? The ground school? The phrase "cargo cult" entered serious intellectual discourse through Richard Feynman's 1974 Caltech commencement address, in which he described "Cargo Cult Science" — research that has the form of science but is missing something essential. The islanders in Melanesia built bamboo control towers and carved coconut-shell headphones for the operator. They had replicated every visible artefact of an airfield. No planes landed. The ritual was perfect. The understanding was absent.

I have watched organisations send their infrastructure teams on a two-day cloud certification course and declare them ready for a migration that will take eighteen months and cost millions. That is not training. That is coconut headphones — the cargo-cult imitation of preparation, where the ritual replaces the substance. The planes do not come. They never do.

The United States Parachute Association's (USPA) 2024 fatality summary shows that the skydiving fatality rate has improved by a factor of 48 since 1961. The picture is similar in Britain, where British Skydiving (the sport's national governing body) reports comparable safety gains driven by the same systemic improvements. The US data is the most comprehensive: from 11.1 deaths per 100,000 jumps to 0.23. The CYPRES AAD alone has saved more than 5,400 lives since its introduction. Five thousand four hundred human beings who would be dead without an automated safety system that does not ask permission, does not wait for consensus, and does not care about your feelings.

If skydiving had a 62% failure rate, no sane person would jump. The UK Civil Aviation Authority would ground every drop zone in Britain. The Federal Aviation Administration would do the same in the United States. And yet in enterprise technology, a 62% failure rate is treated as normal. As the cost of doing business.

That is not risk management. That is negligence dressed in a suit.

Article content
A real photo of my feet (pre AI) some 6,000ft in the air above the English countryside

The Counterargument I Owe You

"OK," you might say, "but skydiving is a bounded physical system. Gravity is constant. Terminal velocity is known. The failure modes are finite. Cloud migration is an unbounded sociotechnical system where the failure modes mutate, where the vendors change the pricing model mid-flight, where a junior engineer can misconfigure an IAM policy and expose the entire customer database."

You are right. Cloud is more complex than skydiving. The variables are less predictable. The blast radius is wider. And that is precisely the argument for checklists, not against them.

Seneca taught his students to rehearse catastrophe — premeditatio malorum — not because the rehearsal would prevent the catastrophe, but because it would prevent the paralysis. When the system is infinite, the checklist is the finite thing you control. The skydiver cannot control the weather, the turbulence, or the moment of panic. But the skydiver can control the pre-jump check, the altimeter reading, the pull altitude. The checklist does not pretend to eliminate uncertainty. It creates a floor beneath which the uncertainty cannot drag you.

Cloud migration needs that floor. Right now, most organisations are free-falling without one.

The Cloud Migration Safety Stack

Every safe cloud migration depends on four layers, stacked in order. Remove one and the layers above it collapse.

Layer 1: Understanding (Ground School) First-principles knowledge of distributed systems, failure modes, and organisational dynamics. The foundation everything rests on. Not a certification course. Not a vendor webinar. A structured, multi-month programme in which the team migrates a non-critical workload end-to-end before touching production. You learn to pack the parachute before you jump out of the aircraft. If your team cannot deploy, monitor, roll back, and explain a migrated service in a test environment, they are not ready for production. Full stop.

Diagnostic: If you removed all your tooling tomorrow, could your team explain what the tools were doing and why?

The islanders with their coconut headphones were missing Layer 1 entirely. They had perfect practices — the runway, the fires, the hand signals — built on no understanding whatsoever. This is the cargo cult failure mode, and it is precisely what Feynman warned against: the form without the foundation.

Layer 2: Culture & Incentives (The Human Environment) Psychological safety, blameless postmortems, learning loops. The human environment that lets good architecture survive contact with reality. If your engineers are afraid to admit a migration is failing because they will be blamed, your reserve parachute might as well not exist — nobody will pull the handle.

Diagnostic: When did your team last kill a production incident with no blame, no punishment, and a published timeline?

Layer 3: Architecture (The Main Chute) The migration plan, the service boundaries, the dependency maps. This is what most organisations think migration is. It is necessary and insufficient. The main chute works most of the time. Most of the time is not a safety standard. Critically, this layer must be inspected by someone who did not design it — in skydiving, the reserve is packed by a certified rigger under CAA or FAA regulation, not by you. In cloud migration, that is an independent architecture review function whose incentive is not to prove the migration will work but to prove it might not.

Diagnostic: Can you draw your system's failure domains on a whiteboard right now — and has someone outside your team stress-tested them?

Layer 4: Automated Kill Switches (The AAD) This is the layer almost nobody builds. Automated, threshold-triggered mechanisms that fire without human approval when predefined conditions are met:

  • Cost ceiling: Monthly spend exceeds 130% of forecast? The system freezes new deployments and alerts the CFO. Not the engineering team. The CFO.
  • Error-rate trigger: P99 latency exceeds SLA for more than fifteen minutes? Traffic automatically routes back to the on-premises system.
  • Security tripwire: Publicly exposed storage bucket detected? Automated lockdown. No human in the loop.

The AAD exists because the moment you most need to pull the reserve is the moment you are least capable of deciding to pull it. In skydiving, that moment is unconsciousness. In cloud migration, it is the sunk-cost fallacy — the organisational inability to admit that the migration is failing when you have already spent two million pounds on it.

Diagnostic: If your migration went catastrophically wrong at 3 a.m. on a Saturday, would any automated system catch it before a human noticed?

A note on ownership: In most organisations, these layers belong to different people. Platform teams own Architecture. Leadership owns Culture. And Understanding — critically — is everyone's responsibility, which in practice means it is often nobody's. If you cannot name the owner of each layer in your organisation, you have found your first vulnerability.

Two Kinds of Repatriation

Not all retreats are failures. This is the distinction the industry refuses to make.

37signals moved off the cloud in 2023. That was strategic repatriation — the reserve deployed exactly as designed. They ran the numbers. The economics no longer justified cloud hosting for their specific workload profile. They had the on-premises capability to return to. They planned it, executed it, and saved millions. That is a skydiver deploying the reserve at the correct altitude, calmly, with full situational awareness. The reserve worked because it was packed.

GEICO's cloud migration troubles were the opposite. That was panicked repatriation — fumbling for the ripcord at five hundred feet because nobody had checked the gear. No clear rollback plan. No tested repatriation path. No AAD firing to force the decision before it was too late. They did not choose to come back. They were forced back, and the cost — financial, operational, reputational — was catastrophic.

The difference is not whether you come back. The difference is whether you packed the reserve before you jumped.

The Only Migration

The B-17 pilots were not less brave for using a checklist. They were more effective. The checklist freed them to focus their courage on the parts of the mission that actually required courage — the flak, the fighters, the weather, the decisions that no system could make for them. The checklist handled the parts that courage could not.

I have watched organisations spend millions on migration programmes that had no kill switch, no tested rollback, no automated threshold, no independent reserve inspection. I have watched CTOs bet their careers on plans that had less systemic safety than a first-time skydiver's rig. And I have watched them crash, not because they lacked talent or ambition, but because they lacked a checklist.

The Boeing Model 299 taught us this in 1935. The skydiving community teaches it every single day. The lesson is there for anyone willing to learn it.

The only migration that matters is not from on-prem to cloud. It is from courage to systems.

Article content
Bonus: skydivers can be nerds too!

Bonus photo ^ is Phil Hartree ~2011 on a weather hold teaching me landing patterns and also subnet masks CIDR blocks 🪂.

(Views in this article are my own.)