
A local control plane for coding agents that makes long-running AI work inspectable, resumable, and safe to merge.
Tue Jun 02 2026
System designer and full-stack engineer
Agent memory, task orchestration, Git worktrees, approval flows, and multi-repo review state.
Turned one-off coding-agent chats into durable engineering workflows with clear context, saved diffs, test evidence, and human-controlled merges.
The funny thing about coding agents is that the impressive part is not always the code they write.
The impressive part is getting from "please fix this" to "this exact diff is ready to merge, here is what changed, here is what was tested, here is what still needs a human decision, and no unrelated work got swept in along the way."
That second sentence is where things get interesting.
When I started using coding agents heavily, the failure mode was not that the model could not write a component, debug a test, or trace a bug. It could. The failure mode was that the work had no durable shape once the chat moved on.
Which task was this for? Which repo was dirty before the agent started? Which branch should the final change land on? Was the latest run the one that passed tests, or the one that got interrupted halfway through a refactor? Did the agent change two repos because the feature needed it, or because it got lost?
I did not need a smarter prompt. I needed a local control plane.
So I built one.
The system is a local memory and orchestration layer for coding agents. It treats agent work like real engineering work instead of a disposable chat transcript.
At a high level, it gives every meaningful piece of work:
The short version:
Task
-> context packet
-> isolated agent run
-> saved multi-repo diff
-> review
-> approve and merge
-> handoff
That loop turned coding-agent sessions from "a helpful conversation" into something closer to a small, local engineering workflow system.
Agent memory can get weird if it is treated like a giant scrapbook. More context is not always better context.
The first useful abstraction was the context packet. Before a run starts, the system assembles a compact brief from the memory repo:
The point is not to make the agent remember everything. The point is to make the agent start in the right neighborhood.
A typical task context might say: this is part of the agent-orchestration effort, it targets the memory repo, it is related to worktree approvals, and the last useful handoff said the server needs a regression test for committed worktree diffs.
That is dramatically more useful than a transcript pile.
It also keeps the system honest. If a task has no related decisions, no notes, and no recent handoffs, that absence is visible. The agent has to inspect the repo instead of pretending the memory layer already knows the answer.
The second abstraction was the handoff log.
Every non-trivial run leaves behind a small structured entry:
what: "Implemented worktree-isolated task runs and approval merge APIs."
evidence:
changed:
- "server: creates per-repo worktrees and captures grouped diffs"
- "ui: exposes Inspect diff and Approve merge controls"
validation:
- "unit tests passed"
- "syntax checks passed"
blocker: "none"
next: "Review the saved diff and approve merge when ready."
This is intentionally plain.
I do not want a five-page essay after every run. I want the next engineer, which is often future me, to know:
That is enough to make a task resumable. It is also enough to make failed runs useful. A failed run that records the exact blocker and evidence is not noise. It is a breadcrumb with a timestamp.
The big unlock was moving agent edits into isolated Git worktrees.
Before that, a coding agent and I were sharing the same checkout. That works until it very much does not. My local repo might already have half-finished work, debug edits, generated files, or a branch in the middle of a merge. Letting an agent write directly into that is asking for mystery meat diffs.
With worktree-based runs, the control plane creates a separate workspace for the task. The agent works there. The source repo stays untouched.
That changed the feel of the whole system.
The agent can make a real change. It can run tests. It can commit inside its worktree. The server can inspect both dirty files and committed worktree changes. Then the UI can show a saved diff snapshot even after the run ends.
This matters even more for multi-repo work. Some features are naturally split across a frontend, an API, and a deployment repo. The system groups diffs by repo so review does not collapse into one giant blob.
The principle is simple:
Agent autonomy should increase only when inspection gets better too.
Worktrees made autonomy safer because they made the boundaries sharper.
I did not want an agent to silently merge its own work into my active branch. That is not a productivity feature. That is a trust leak.
So the system has an explicit approval flow.
When a run finishes, the UI can show:
Only then does the user approve the merge.
If the target repo is dirty, the merge blocks. That is not a bug. That is the system protecting unrelated human work from getting mixed into an agent-generated patch.
A good approval system should be a little stubborn. It should say, "I can merge this, but not while the destination has changed underneath me."
That one rule prevented an entire class of subtle workflow failures.
One of the most useful improvements was almost comically small: task-level base_branch metadata.
At first, a run could infer its target branch from the current checkout. That sounds reasonable until you use the system for real work. The current checkout is a moment in time. The task's intended target is durable state.
Those are not the same thing.
So tasks can now store a base branch:
id: T0147
title: "Add durable base branch for worktree runs"
status: done
base_branch: main
The run still captures the exact base commit when it starts. That per-run snapshot is the baseHead. But the task keeps the durable intent: this work targets main, or this work targets a release branch, or this work targets a long-lived integration branch.
That distinction made approval merges much easier to reason about:
base_branch is what the task means.baseHead is what the world looked like when the run began.Tiny metadata. Big reduction in confusion.
Once tasks, handoffs, worktrees, and approvals existed, server-managed subagents became much less mysterious.
The question was no longer "Can I spawn several agents?" That part is easy.
The real question was:
The orchestration layer handles a few modes:
This made the system feel less like "many agents in a trench coat" and more like a workflow engine with agents inside it.
The manager does not need to be magical. It needs to be accountable. It should produce a plan, create or select tasks, launch work, inspect outcomes, and leave a reviewable handoff.
Most of the product work was not glamorous. It was making state visible.
A task page needed to answer questions quickly:
The UI did not need to look like a spaceship. It needed to keep repeated engineering actions calm and inspectable.
That shaped the controls:
The best agent UI is not the one with the flashiest animation. It is the one that tells you exactly what is happening when an autonomous system touches your code.
| Decision | Why it mattered |
|---|---|
| Markdown tasks as the source of truth | Easy to inspect, diff, repair, and commit. |
| SQLite as a generated search index | Fast discovery without making a database the canonical memory store. |
| Context packets before agent runs | The agent starts with relevant history instead of vague vibes. |
| Structured handoffs | Every run leaves concise evidence and a next step. |
| Git worktrees | Agent edits are isolated from human work. |
| Saved multi-repo diffs | Review survives after a run exits. |
| Explicit approval merges | Humans keep control of integration. |
| Durable base branch metadata | Task intent does not drift with the current checkout. |
| Server-managed orchestration | Subagents become coordinated work, not scattered sessions. |
The theme is boring infrastructure around exciting tools.
That is the part I like.
The biggest lesson was that agent quality is not just model quality. It is workflow quality.
A strong coding agent in a weak workflow still creates uncertainty. A decent coding agent in a strong workflow can be surprisingly useful because the system catches, contains, and explains its work.
Some principles I would reuse anywhere:
If the agent can change code, its state deserves first-class modeling. Task status, run IDs, branch targets, diffs, validation, blockers, and approvals should not live only in prose.
An interrupted run should still be useful. A blocked approval should explain whether the issue is stale metadata, a dirty target repo, or a missing worktree snapshot.
Summaries are nice. Handoffs are operational. They say what changed, where the evidence is, and what happens next.
The merge path should be conservative. If the destination moved, stop and ask for review. Fast automation is not worth hidden state corruption.
Worktrees let agents work independently. Saved diffs, tests, and approval checks make that independence reviewable.
The fun part is that once the state model clicked, the whole experience changed.
Instead of treating each coding-agent session like a little adventure, I could treat it like a unit of engineering work. Start from a task. Load the right context. Run in isolation. Capture the result. Review the diff. Approve the merge. Leave the next person a handoff.
It made the agent feel less like a chat box and more like a teammate plugged into a real workflow.
Not because the model became perfect. It did not.
Because the system around it became legible.
That is the punchline: the future of coding agents is not only better generation. It is better coordination.
And sometimes better coordination starts with a Markdown task, a Git worktree, and a very stubborn merge button.