The Inference Loop: Why Coding Is Becoming a Loop, Not a Keystroke

For thirty years, writing software meant producing keystrokes. You held the model of the program in your head and translated it, character by character, into a file. Autocomplete made the keystrokes faster. Copilots made them smarter. But the unit of work never changed: a human typed code, and the machine ran it.

That unit is now changing. The most important software primitive of this decade is not a keystroke — it’s a loop. Call a model, let it run a tool, feed the result back, and repeat until the job is done. It fits in about ten lines of code, and it is quietly eating software development. We named our company after it: the inference loop.

This post is about what that loop actually is, why it beats the one-shot prompt that defined the first wave of AI coding, where it still falls apart, and where it’s heading. If you lead an engineering team and you’re trying to separate the signal from the hype, start here.

What an agentic coding loop actually is

Strip away the branding and an AI coding agent is almost embarrassingly simple. As Simon Willison puts it in his guide to how coding agents work, an LLM agent is “something that runs tools in a loop to achieve a goal.” A tool is just a function the surrounding program — the harness — exposes to the model: read a file, run a shell command, search the codebase, execute the tests. The model decides which tool to call; the harness runs it and hands the output back.

That’s the entire trick. The loop looks like this:

Give the model a goal and a set of tools.
The model responds with either an answer or a request to call a tool.
The harness runs the tool and returns the result to the model.
Go back to step 2 until the goal is met.

Thorsten Ball famously demonstrated that you can build a working coding agent in a few hundred lines of code — no framework, no orchestration engine, no vector database. Agents, as the saying goes, are “just LLMs with the right tools in a conversation loop.” The reason this matters is that it demystifies the whole category. The magic isn’t a secret architecture. The magic is that a capable model, given the ability to act and then observe the consequences of its action, can navigate a problem it could never solve in a single pass.

Willison goes further in Designing agentic loops, arguing that the real skill of the coming years is designing the loop itself: which tools to expose, how much autonomy to grant, and how to run the loop safely — even in what he half-jokingly calls “YOLO mode,” where the agent executes commands without asking permission at each step. The loop is the new programming model. Designing it well is the new craft.

The 2025–2026 tool landscape

The loop is the primitive; the products are the harnesses built around it. By 2026 the landscape has sorted itself into a few clear shapes.

Claude Code is agent-first. It lives in the terminal, has direct access to your filesystem and git, and assumes the AI drives while the developer reviews. You give it a task, it reads files, edits them, runs the tests, reads the failures, and tries again — looping until it’s done or stuck. The human’s job shifts from author to reviewer.

Cursor took the opposite entry point: embed the agent inside the IDE as a collaborator that works alongside you in the editor you already use. The two philosophies have been converging, though — Cursor shipped a CLI with agent modes in January 2026, bringing it closer to the terminal-native, agent-driven model. Codex, Aider, and Cline round out the field, each making slightly different bets about where the human sits relative to the loop.

The convergence is the story. As Sourcegraph’s 2026 guide to agentic coding documents, the frontier is no longer “can the agent edit a file” — it’s running these loops against large, real-world codebases, with the agent reaching for code search, cloud handoff, and long-running background tasks. The tools are racing toward the same destination from different doors: a CLI-native agent that can be handed a real ticket and trusted to make progress on a real repository.

Why the loop beats the prompt

The first wave of AI coding was one-shot generation: you wrote a careful prompt, the model produced a block of code, and you pasted it in and prayed. When it was wrong — wrong API, hallucinated method, subtle off-by-one — you started over with a better prompt. The model never saw whether its code actually worked.

The loop changes this completely, and the difference is not incremental. Three things become possible the moment the model can act and observe:

Self-correction. The agent runs the code, sees the stack trace, and fixes its own mistake — the same way a human developer does. A one-shot model is blind to its own errors. A looping agent has feedback. It can write a test, watch it fail, change the implementation, and watch it pass.

Grounding in reality. Instead of generating from its memory of how a library probably works, the agent reads the actual source, greps for the actual function signature, and checks the actual types in your codebase. The loop replaces confident guessing with cheap verification.

Decomposition over time. Hard problems don’t yield to one big leap; they yield to many small, verified steps. A loop is a machine for taking small steps. The agent can explore, hit a wall, back up, and try a different approach — accumulating progress across dozens of tool calls rather than betting everything on a single generation.

This is why a mid-tier model inside a good loop routinely outperforms a frontier model answering in one shot. The intelligence isn’t only in the weights; it’s in the loop wrapped around them.

Where it breaks today

Honesty is the whole point of a post like this, so here’s where the loop still falls down in 2026.

Long-horizon tasks. Agents are strong at tasks measured in minutes and tens of tool calls. Tasks measured in hours — a sprawling refactor across forty files, a migration with subtle ordering constraints — still tend to drift. The agent loses the thread, makes a locally sensible change that breaks something three files away, and can’t always recover.

Context overflow. Every loop iteration adds to the model’s context: files read, command output, prior reasoning. On a big codebase the relevant information eventually exceeds what fits in the window, and the agent starts forgetting what it learned ten steps ago. Managing what the agent remembers — and what it’s allowed to forget — is an unsolved, actively-worked problem.

Verification gaps. The loop self-corrects only as well as its tests and checks allow. If your codebase has thin test coverage, the agent’s feedback signal is weak, and it will confidently declare victory on code that doesn’t actually work. Garbage feedback in, garbage confidence out.

These limits are exactly why the conversation is moving toward the harness — the scaffolding of context management, verification, and guardrails around the model — and toward honest benchmarks that measure agents on real, long-horizon work rather than toy problems. The loop is necessary but not sufficient. What you build around it determines whether it’s a demo or a dependable colleague.

Where it’s heading

Point the current trajectory forward and three things come into focus.

Longer autonomous runs. As context management improves and models get better at staying on-task, the reliable horizon stretches from minutes toward hours. The agent you hand a bug today, you’ll hand a feature tomorrow.

Parallel sub-agents. Rather than one agent grinding through a task serially, a coordinator spawns several sub-agents — one exploring the codebase, one writing tests, one implementing — that work in parallel and report back. The loop becomes a tree of loops.

The developer as conductor. This is the deepest shift, and Addy Osmani names it well in Coding for the Future Agentic World: developers evolve from “coders” to “conductors.” Your value moves up the abstraction ladder — from writing the lines to specifying the goal, designing the loop, reviewing the output, and owning the judgment about whether the result is actually right. The keystroke was never the valuable part. The thinking was. Agentic coding strips away the typing and leaves the thinking exposed.

That’s not a threat to good engineers. It’s a promotion. The work that remains is the work that was always the point: understanding the problem deeply enough to know what “done” means, and having the taste to recognize it when you see it.

Conclusion

The keystroke had a thirty-year run. It’s being replaced by a loop that fits in ten lines — call the model, run a tool, feed back the result, repeat. That loop is simple enough to build in an afternoon and powerful enough to reshape how software gets made. It beats the one-shot prompt because it can act, observe, and correct. It still breaks on long-horizon work and weak verification. And it’s heading, fast, toward longer runs, parallel agents, and a developer whose job is to conduct rather than to type.

This is the loop we build businesses around. If your team is trying to figure out where agentic coding actually fits — which tasks to hand to the loop, how to build the harness and guardrails around it, and how to keep your engineers in the conductor’s seat — that’s exactly the work we do.

Talk to us about putting agentic coding loops to work in your team.