The Harness Is the Product: Why 98.4% of an AI Agent Isn't the Model

There’s a number that should change how you think about building with AI, and almost nobody outside the people writing agent code has internalized it yet.

When researchers at MBZUAI went through the source of Claude Code — roughly 1,884 files and half a million lines — and sorted what they found, the split was startling. About 1.6% of a production coding agent is the AI decision logic. The model, the prompts, the part that “thinks.” The other 98.4% is harness: the code that feeds the model context, runs its tools, validates outputs, retries failures, manages memory, and decides when the work is done. Four different teams built four different agents, and they all converged on roughly the same shape.

Sit with that ratio for a second. The thing everyone talks about — the model — is the thin sliver. The thing almost nobody talks about — the harness — is virtually the entire product. For a company named after the loop the harness runs, this isn’t a surprise. But it’s the clearest evidence yet for something we’ve argued all year: the model is the commodity; the harness is the craft.

What a harness actually is

“Harness” has become one of those words that gets used loosely, so let’s pin it down. A harness is everything wrapped around the model that turns a text-completion engine into something that can do real work without a human typing every step.

Concretely, a harness owns:

Context assembly — deciding what the model sees on each turn: the task, the relevant files, prior results, tool definitions. Get this wrong and the smartest model in the world gives you confident nonsense.
Tool execution — actually running the shell command, editing the file, calling the API, and handing the result back. Typed tool schemas here, as the agent-architecture literature notes, sharply cut malformed calls.
The loop — reason → act → observe, over and over, until a goal is met. This is where loop engineering lives.
Verification and stopping — the genuinely hard part. How does the system know the work is correct, and when is it allowed to stop? A test passing, CI going green, a reviewer model signing off.
Guardrails — sandboxing so a runaway agent can’t wreck the filesystem, code-review gates before changes land, separation of the model that writes from the model that checks.
Memory and recovery — what persists between turns and sessions, and how the system picks itself back up after a crash or a cancelled request.

None of that is the model. All of it is the difference between a demo and a product.

Here’s the counterintuitive part: as models get better, the harness gets more important as a share of the work, not less.

A more capable model absorbs some scaffolding — last year’s careful prompt-chaining to force step-by-step planning is this year’s native behavior, a boundary we wrote about in Models vs Agents. But a more capable model is also trusted to take on longer, riskier, more autonomous tasks. And the longer and more autonomous the task, the more it needs exactly the things the harness provides: reliable context over many turns, verification you can trust without watching, guardrails for when it goes wrong, recovery when it crashes at hour three.

Andrej Karpathy captured where this is heading when he described the “Loopy Era” — humans no longer writing most of the code, but directing, supervising, and orchestrating teams of agents. His own AutoResearch experiment ran 700 experiments in two days on a single GPU. You do not run 700 unattended experiments on the strength of a good prompt. You run them on the strength of a good harness.

That’s why the harness share holds — or grows. The model is the engine; the harness is the entire rest of the car, the road, and the traffic system. Better engines don’t make the car matter less.

The harness is also the moat

This reframing has teeth for anyone deciding where to invest.

If the model is 1.6% of the product and you don’t train frontier models, then competing on “we have access to a good model” is competing on the commodity. Everyone has access to a good model. The defensible work — the part that’s hard to copy, that compounds, that you can actually sell — is the 98.4%: the harness tuned to a specific domain, a specific codebase, a specific regulatory environment, a specific language.

The guardrails market alone is projected past $100B by 2034, and 64% of organizations now maintain AI security policies. That spend isn’t going into models. It’s going into the harness layer — validation, sandboxing, review gates, observability — because that’s where production risk actually lives. When the industry standardizes the model interface (and with the Agentic AI Foundation now stewarding open standards like MCP and AGENTS.md, it is), the differentiation moves up the stack into the harness. Standard plugs, custom machine.

Why this is good news for Southeast Asia

If the value were in the model, this would be a closed game. Frontier pre-training costs hundreds of millions of dollars and a handful of labs control it. A developer in Phnom Penh or Da Nang or Cebu would be permanently on the outside, renting access.

But the value is overwhelmingly in the harness — and the harness is software engineering. It’s context management, tool integration, test design, error handling, domain knowledge, careful systems thinking. None of that requires a GPU cluster. It requires good engineers who understand a problem deeply, and Southeast Asia has those in abundance and growing fast: Cambodia alone now counts dozens of AI startups and a young, expanding talent pipeline out of institutions like RUPP and ITC.

The harness is also where local knowledge becomes a durable advantage. A generic agent doesn’t know Khmer document conventions, or the quirks of a Cambodian bank’s compliance workflow, or how an agricultural co-op actually records a harvest. The model is the same everywhere; the harness is where you encode the context that makes it useful here. That’s not a disadvantage to overcome — it’s a defensible position that the frontier labs cannot build from California.

What to do with this

If you’re a developer: stop optimizing prompts and start engineering harnesses. The leverage isn’t in the cleverer phrasing — it’s in better context assembly, a tighter verification loop, guardrails you can trust unattended. That’s the skill that compounds, and the skill the market is about to pay for.

If you’re a business: don’t buy “an AI model.” Buy — or build — the harness around it that’s shaped to your work. The off-the-shelf demo is the 1.6%. The thing that actually moves your numbers is the 98.4% nobody put on the slide.

And if you’re anywhere in Southeast Asia wondering whether there’s a real seat at this table: there is, and it’s the bigger seat. The expensive, commoditized part of intelligence is being given away as an API. The valuable, defensible, build-it-yourself part is the harness — and that’s engineering, the thing this region does, and can do, as well as anywhere on earth.

The model is the spark. The harness is the engine. Build the engine.

What a harness actually is

Why the model keeps getting smaller (as a share)

The harness is also the moat

Why this is good news for Southeast Asia

What to do with this