Designing Agent Loops That Run While You Sleep

On June 7, 2026, Peter Steinberger — the creator of OpenClaw, the most-starred new repository in GitHub history — posted seven words that cleared millions of views in a day: “You shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.” The replies turned into a brawl. Roughly four in ten called it the next abstraction layer; the rest called it a cron job wearing a hat.

Both camps missed the point, which is smaller and more useful than either. The shift Steinberger described isn’t a new technology. It’s a change in where you put your hands. Instead of sitting in the chair and typing the next instruction, you write the program that decides what the next instruction should be — and then you go to sleep while it runs. Boris Cherny, who heads Claude Code at Anthropic, put it plainly: “I don’t prompt Claude anymore. I have loops running. They’re the ones prompting Claude.”

A few weeks later Google’s Addy Osmani gave the instinct a name — loop engineering — and that name is sticking. This post is about how you actually build one of these loops, and how you keep it from quietly setting fire to your token budget at 3 a.m.

The cycle at the center

Strip away the tooling and every autonomous loop is the same four beats, turning over and over until a goal is met:

Perceive → Reason → Act → Observe.

The agent perceives its current state (the task, the files, the last result), reasons about what to do next against the goal, acts by running a tool — a shell command, a file edit, an API call — and then observes what came back. That observation becomes the next perception, and the loop turns again. Osmani calls it “the scientific method applied to coding”: form a hypothesis, run the experiment, read the result, revise. The intelligence is in the model; the autonomy is in the cycle.

This is the same loop we dissected in Inside the Agent Loop. What’s new in the loop-engineering framing is the altitude. Prompt engineering tunes a single turn. Harness engineering tunes the environment one agent runs inside. Loop engineering sits one floor above the harness: it designs the program that drives the agent across hundreds of turns without you in the chair.

What turns the loop while you sleep

A loop that runs unattended needs three things a chat session doesn’t.

A trigger. Something has to start the loop without you pressing enter — a cron schedule, a new issue label, a failing CI run, a webhook. The trigger is what decouples the work from your keyboard. This is the literal difference between an agent you supervise and one that runs overnight.
A verifiable goal. “Make it better” is not a goal a loop can act on; “all tests pass and the linter is clean” is. The loop needs a condition it can check itself, mechanically, on every turn — because you’re asleep and can’t be the one checking. Underspecified goals are the number-one cause of loops that spin without converging.
A maker/checker split. The most reliable loops separate the agent that writes from the agent — or test suite, or CI gate — that verifies. One proposes, a different one signs off. A model grading its own homework is how confident nonsense ships to production.

Give a loop those three and it can genuinely run while you sleep. The question is no longer whether it can run unattended — it’s whether you’ll trust what you find in the morning.

The guardrails are the engineering

Here’s the part the “it’s just a cron job” crowd gets wrong. Anyone can wrap a model in a while loop. The difference between loop engineering and running loops is entirely in the guardrails — the code that decides when to stop. Without them, a loop doesn’t fail gracefully; it fails expensively. Uber learned this the hard way and capped engineers at $1,500 per tool per month after burning its annual AI budget in four months.

Every loop you let run unattended needs all of these, not some:

A hard iteration cap. A maximum number of cycles, full stop. Loops that cycle between two fixes — each breaking what the last one repaired — will do it forever if you let them.
A token and cost budget. A hard ceiling per run. When it’s hit, the loop stops, even mid-task. This is the guardrail that protects you from waking up to a four-figure bill.
No-progress detection. If the output hasn’t meaningfully changed in N turns, the loop is stuck. Exit and escalate rather than burn another hundred turns going nowhere.
Circuit breakers on tools. A retry limit on each tool call, so one flaky API doesn’t become an infinite retry storm.
A human checkpoint for irreversible actions. Deleting data, pushing to production, sending an email — these wait for a human, even in an otherwise autonomous loop. Sandbox everything else so a runaway agent can’t wreck the filesystem.

Notice that none of these make the agent smarter. They make it safe to leave alone — which is the entire proposition. A loop you have to babysit isn’t a loop; it’s a chat session with extra steps.

Why this matters more in Southeast Asia

It’s tempting to read all this as a frontier-lab concern. It’s the opposite. Loop engineering is the most accessible high-leverage skill in modern AI, and that asymmetry favors exactly the developers people keep counting out.

A well-designed loop is a force multiplier on time, and it doesn’t care which time zone you’re in. A two-person team in Phnom Penh or Da Nang that designs good loops can put out the work of a much larger team — because the loops run through the night, on the weekend, across the holiday, while the team sleeps. Andrej Karpathy’s AutoResearch experiment ran 700 experiments in two days on a single GPU; you don’t get that from a good prompt, you get it from a good loop. That leverage is available to anyone who understands the cycle and the guardrails, and it costs nothing but the engineering.

And loop engineering is engineering — verifiable goals, error handling, test design, cost discipline, careful systems thinking. None of it requires training a frontier model. It requires good developers who understand a problem deeply enough to encode “what does done mean?” into a check a machine can run. Southeast Asia has those developers in growing numbers, out of institutions like RUPP and ITC in Cambodia and across the region. The model is rented from California at a flat rate; the loop that wraps it — tuned to your domain, your codebase, your definition of done — is yours, and it’s where the durable value sits.

Where to start

Pick one task you do on a schedule and dread — triaging new issues, updating dependencies, keeping a changelog current. Write down what “done” looks like as a check a machine can run. Wire a trigger to it. Add the guardrails before you add ambition: iteration cap, cost ceiling, no-progress exit. Then let it run once while you watch, and once while you don’t.

Steinberger’s seven words weren’t really about coding agents. They were about where the leverage moved. It moved out of the prompt and into the loop — and the loop runs while you sleep. Build the loop.