Running in Circles

Extra lightweight agent “orchestration“. A variation on the Ralph Wiggum theme

A few hype cycles ago, the “Ralph Wiggum” loop for agentic coding made the rounds and became such a strong meme — I believe because it gave interested coders a simple idea to work with, in stark contrast to all the complex orchestration frameworks floating around at the time. It certainly inspired me to finally “do it” and say, “well, now it’s time for me to look into agent orchestration.”

To recap, “Ralph” prescribes LLM agent invocations inside a Bash loop. Every round, the agent gets fed — importantly — a small task from a backlog, one that could comfortably be done in a single context window. On top of that, at the end of each invocation, static checks like unit tests can run, and if those fail, the step gets repeated. Something like that.

Now, I am very interested in “automating myself away” — insofar as the chores go, in any case — but I’d also be okay if the LLM did all of the coding, to be honest. So I was curious to think through, with the help of some experiments, how far this approach can be pushed with respect to delivering high-quality code — code of the quality I myself would be personally satisfied with.

In the following, I describe something built the way developers so often do things: take something simple — here the Ralph loop as a whole, as well as parts of it — and compose something more complex out of it. So here are the things I think are worth highlighting:

Ralph is a loop that fetches small, well-scoped tasks from a backlog. Well-scoped tasks have test criteria. So my own thing will fetch one such item and work through it per iteration.
Instead of a single agent invocation, I have many such invocations in a row, each consuming an artefact like a text description and producing one or more artefacts or making changes to the code. Each agent invocation should get bite-sized chunks.
No branching, no nothing — I have a pipeline, a queue, a bucket brigade or assembly line where one thing happens after the other. This is what greatly simplifies things. No multi-agent setups, no worktrees, no communication; work artifacts get forwarded or handed over to agents further down the line to pick up. If you want to parallelise work, why not work on another project in parallel (or have another checkout)?
Between agent invocations, static checks.
At key points, human in the loop. I get a message that pops up on a dashboard and asks for my attention and direction (most of the time, that’s an LGTM).
During those key points, I can look at the app running in a browser. I forgot to mention: the agents have a headless browser running at all times themselves, so they literally have the full picture.

Now, how does that help me produce high-quality code? Well, many of the steps in the pipeline are review steps, or steps where agents check and judge previous agents’ work. They produce artefacts that agents further down the line then take as prescriptions. I feel this is very much in line with the Ralph loop in the sense that it brings the right amount of determinism into the mix. In the Ralph loop, that’s the ever-same simple loop structure; here, it’s the ever-same sequence of steps.

I believe this works reasonably well, and I have it working to build a side project of mine (literally). That it would was based on the following assumptions:

Codebases are seldom grasped and understood as a whole. Systems are complex things built out of simpler things, hence recursive. Things at each level of abstraction can — actually, due to cognitive limits, have to — be judged in some isolation.
The quality of isolated chunks of code is judged by perhaps somewhat subjective, but relatively stable (for any given developer) criteria.
These criteria are stateable. I believe strongly that there is perhaps much secret sauce in building such intuition, through lots of experience. But at the end of the day, one can formulate them — at least a couple of them.
Code gets shaped into high quality in multiple passes, never in a single pass.
Throwing away code is not a bad thing. Things often flow nicely when you are forced to do it over again, with the experience of the first try.

So, basically what I’m doing is going in circles, each time looking at the code from a fresh perspective, for every new task. And then I apply things like the Boy Scout rule. I mean, my agents do. Whatever they find, they complain about or fix before “the actual” work begins, for example. This all spares me conditional logic (a more complex state machine), which would make it necessary for agents to determine when something is done, or what needs to be done, or when something is good enough, and so on. I don’t want any of that. Instead, whatever is left over, whatever imperfections remain, gets left for the next round. As far as the work to do now is concerned, it was tightly scoped anyway, and I make sure I see reports of done work at the end, before I merge.

Here is the simplified version of what happens inside a single interation:

Each box is a step and each rounded box an artefact. Note that Revert and Code Review steps are not running in parallel.

Code is here for reference. Beware it’s not as simple as depicted above; not in the sense that it’s fancy complex, rather messy, and - I fear - a little overengineered. Classic beginners mistake! But good for learning.

Running in Circles

# Comments