I build and govern fleets of AI agents that ship real systems.
A real governed agent run. An agent team scaffolds a CLI: every phase implemented, independently reviewed, then gated before commit.
Final gate: all five falsification criteria pass.
How I run agents
The orchestration model
Every run follows the same structure. A boss holds the mission and never writes. A manager owns the phase sequence and the state file. Workers are disposable; they receive a brief and report back. Adversarial reviewers are always separate from the writers they review. Human gates are hard stops, not suggestions.
The map shows the structure used by the missions documented on this site.
A run, documented
How a build actually goes
- The brief
Gate cleared. Worker briefed.
The wireframe gate passed. I issued the S2 brief to a disposable Sonnet worker: Astro skeleton, token system from the design plan, content collections wired, every page rendering real copy. The brief specified the Orchestra case page first, because that was the hardest screen. The worker had no prior context. It had the brief, the design plan, and the untouchable files.
- Delegation
Worker builds. I stay out of the way.
The worker built the full skeleton: layout, tokens, fonts, four routes, view transitions, the claims audit. It flagged two items it was uncertain about and kept going. That is correct behaviour. I read the output before touching anything.
- A gate fires
Two defects. Both caught before publish.
Manager verification found what the worker had partially flagged. Artifact links pointed at a GitHub remote that does not exist. A token and cost figure appeared in case copy, sourced from a roadmap file rather than a Ledger export. Both fail the falsification clause. Neither reached the published page.
- Recovery
Amendments committed. Rescanned.
Dead links removed. The figure withheld pending a real Ledger export. Em dashes found in source comments stripped. The amendments were committed, the build rerun, and every scan repeated from zero: dash scan, banned-word scan, locked file byte check. All clean.
- Review
Mitch confirmed both calls.
Both manager rulings went to the copy gate. Artifact links stay out until Orchestra has a real public remote. The figure reaches the page only through a real Ledger report embed. Both confirmed. The run closed the same day it opened.
The numbers
What the runs show
This site is one of them. The numbers below come from a ledger my own agents built, and not one of them was typed by hand.
Exported from real runs · 15 May 2026 to 13 Jun 2026
Selected work
What got built
ORCHESTRA
A terminal HUD that watches the Claude Code CLI from out-of-band taps and puts every permission decision in the operator's hands.
Open the case
SLM4SMB
A small-language-model appliance that reads service-business enquiry emails and books them on local hardware, with no cloud and no per-call fee.
Open the case
DIAGNOSTIC BUDDY
A mobile-first AI diagnostic assistant for automotive training that turns messy technician input into a structured diagnostic pathway.
Open the case
ORCHESTRATE
A CLI that scaffolds phase-gated, worktree-isolated, eval-first agent missions, built by an agent team running the discipline it encodes.
Open the caseMore systems
Active
Second Brain
A Claude-native business vault with custom skills: working memory for research, workflow diagnosis, and reusable systems.
Shipped
Forgiveness Letter
A small, finished web app built in one unattended agent run. A personal one.
Open the app
Active
Pedal Builder
Design a guitar pedal in the browser and watch the circuit respond as you build. A collaboration build.
Shipped
TOMSSPYHQ
An offline arcade of classic and 3D browser games, built for a mate.
Active
DAY BUILD
A Claude Code skill library: seventeen product-building skills packaged as an installable marketplace.
Operating principles
How I work
No number exists unless a tool produced it. Every figure ships from a real export or it does not ship.
Every agent runs behind a human gate. Permission decisions stay explicit, logged, and mine.
Define failure before you build. Each system carries a written test for what would prove it broken.
Whoever builds it does not review it. Review is always a separate seat.
Incidents are evidence, not embarrassments. Every failure becomes a blameless postmortem and a new control.
State lives in files, not memory. Any run can crash and resume cold from what was written down.
Get in touch