How Coding Agents Actually Reason

In The Harness Is the Product we made the case that only about 1.6% of a production coding agent is AI decision logic — the part that “thinks.” The other 98.4% is scaffolding. That number does important work: it tells you where to spend your engineering. But it also leaves a question hanging. If the thinking is such a thin sliver, what is it actually doing in there? How does an agent reason its way from “fix this bug” to a correct diff?

This post opens up that 1.6%. Not because it’s where you’ll spend most of your effort — it isn’t — but because you can’t build a good harness around a reasoning process you don’t understand. And the way agents reason turns out to be more shapeable, and less magical, than most people assume.

Reasoning is a conversation with the world, not a monologue

The instinct most people carry over from chatbots is that a model “reasons” by thinking harder in one shot — a long internal monologue, chain-of-thought, that arrives at an answer. For a coding agent, that mental model is wrong, and the difference is the whole game.

The dominant pattern is ReAct — short for Reason + Act — and it interleaves three moves in a loop: Thought → Action → Observation. The agent has a thought (“the test fails on a null input, so the guard clause is probably missing”), takes an action (opens the file, runs the test), and then observes what actually came back. That observation feeds the next thought. Instead of reasoning once and committing, the agent thinks, touches the real world, sees what’s true, and thinks again.

This is the same Perceive → Reason → Act → Observe cycle we covered in Designing Agent Loops That Run While You Sleep — but viewed from the inside, from the model’s point of view rather than the loop’s. And it explains a result that surprises people: a weaker model in a good observe-and-revise loop routinely beats a stronger model answering in one shot. The observation is what corrects the model’s confident guesses before they become bugs. Chain-of-thought hallucinates a plausible story; ReAct checks the story against a failing test.

Why observation beats cleverness

Pull on that thread and you find the load-bearing idea: in agentic reasoning, the grounding matters more than the raw IQ of any single thought.

A pure chain-of-thought model that decides a function returns the wrong value has no way to know it’s wrong — it’s reasoning in a vacuum. An agent that runs the function gets a fact back, and a fact overrides a guess. This is why so much of harness engineering is really about improving the observation: clean test output, precise error messages, type-checker feedback, a failing assertion that points at the exact line. You are not making the model smarter. You are giving its reasoning something true to push against on every turn. Better observations produce faster convergence and fewer confident-but-wrong detours — the difference between an agent that fixes the bug and one that cheerfully rewrites three files that were never broken.

This also reframes the longer-horizon patterns. Plan-and-Execute has a separate planner lay out the steps before any code is written, which helps on tasks too long to hold in one breath. Reflexion adds a verbalized self-critique — the agent writes down what went wrong last time and carries that lesson forward. Both are, at bottom, ways of structuring the reasoning so that observation lands where it can do the most good.

Making the model argue like a logician

The most interesting recent evidence that reasoning is engineerable — not a fixed property of the model — comes from a March 2026 Meta paper, Agentic Code Reasoning by Shubham Ugare and Satish Chandra. They asked a sharp question: can an agent reason about what code means without running it? And they found that how you make it reason changes the answer dramatically.

Their method, semi-formal reasoning, forces the agent through three disciplined steps instead of free-form chain-of-thought: construct explicit premises, trace the execution paths, and derive formal conclusions. The point is the discipline. As the authors put it, the structure “acts as a certificate: the agent cannot skip cases or make unsupported claims.” A chatbot is allowed to wave its hands; a logician has to show every case.

The payoff is not subtle. On judging whether two patches are equivalent, accuracy rose from 78% to 88% on curated examples and hit 93% on real agent-generated patches. On RubberDuckBench, a code-question benchmark, semi-formal reasoning reached 87% — a 9-point gain over standard agentic reasoning — and fault localization improved too. Same model, same code. The only thing that changed was the shape the reasoning was forced to take. Make the model argue like a logician instead of a chatbot, and its judgment about code climbs by double digits.

Why this is good news — and a SE Asia opening

Here is the practical takeaway, and it lands the same place our other posts do. If reasoning quality were a fixed dial inside the model, the only way to get more of it would be to wait for the next frontier release — a game only a handful of labs can play. But it isn’t fixed. A structured reasoning protocol, a maker/checker split, an observation channel tuned to surface the right facts — these lift the quality of the thinking without touching the model’s weights. Reasoning is, in large part, something you engineer around the model.

That is the same reason we keep arguing the agentic era is wide open to Southeast Asia’s developers. Designing a good reasoning protocol for a specific domain — how to verify a Khmer-language invoice, how to check a Cambodian compliance rule, how to trace the logic of a banking batch job — is software engineering and domain insight, not GPU budget. The model is rented at a flat rate from California. The reasoning scaffold that makes it reliable on your problem is yours to build, and it’s exactly the kind of careful, structured systems work this region does well.

What to take from this

Don’t treat the model’s reasoning as a black box you can only pray to. You can shape it. Give it a tighter observe-and-revise loop. Force it through explicit premises instead of letting it free-associate. Split the agent that proposes from the one that checks. Tune the observation channel so the truth arrives fast and clear.

The 1.6% is where the thinking happens. But how well it thinks is decided, in large part, by the 98.4% you build around it — and now by the shape of reasoning you ask it to follow. The model brings the intelligence. You bring the discipline.