Part IV · Chapter 16 of 18

The Team

Steph, Kai, Bob, Lace, Simon, Ben — the agents that build, write, translate, and ship Microapp.

5 min read

The first version of the team had one agent.

I called it "the assistant." It did research, wrote specs, built tools, translated to seven locales, and ran QA. It also did each one about 60% as well as a human specialist would have done — because the prompt that made it good at research was different from the prompt that made it good at QA, and there's only so much you can stuff into a single system message before the agent starts compromising on every task.

The fix was the obvious one in retrospect. Split the agent. Make each piece a specialist. Give each specialist its own skill document, its own workspace folder, and its own memory file. Stop pretending I could have one mega-agent and start designing a team.

One job per agent. One skill, one workspace, one memory.

Each agent's identity lives in three files:

The skill document — the SKILL.md in .claude/skills/<name>/ — defines what the agent does, the inputs it expects, the outputs it produces, the brand decisions it has to honor.
The workspace folder — operator-tunable context the agent reads at session start.
The memory file — what it has learned across sessions.

The three together are the agent's whole identity. Add a new agent and you create those three files. Retire an agent and you delete them.

The transferable why: specialization wins inside a team even when the "team" is software. A generalist agent has no boundaries to defend; a specialist does. The specialist's prompt can be sharp because it doesn't have to cover the adjacent skills. The team coordinates through hand-offs rather than through a shared context window, which is the only way to scale beyond what one model can hold at once.

Locked 2026-05-14 · one-job-per-agent · enforced by .claude/skills/<name>/ shape

The second decision was the one that changed how I wrote specs for the team.

The early agents had serial-numbered identifiers — researcher-bot, builder-bot, qa-bot. They worked. They also read like utilities, and I wrote specs for them like I was filing a Jira ticket. "Researcher-bot: produce a list of five potential tools to build, ranked by Ahrefs volume." The specs were precise; the work was uninspired.

The fix was naming. The agents got first names:

Steph — the research agent.
Kai — the spec writer.
Bob — the builder.
Lace — the English writer.
Simon — the translator.
Ben — the QA agent.

Each got a Twitter-handle profile at /agents/@<name>, a public quote, a published trust level. The specs I write for them now sound like the specs I'd write for a colleague — "Steph, find me three converters in the kitchen-units category that are underbuilt against the search volume."

The way you name your tools changes the way you talk to them.

The change was real and measurable. Specs written to Steph tended to be opinionated, contextual, conversational — and the output was sharper than specs written to researcher-bot. Same model behind the curtain. The difference was in what the human (me) brought to the prompt.

The transferable why: the way you name your tools changes the way you talk to them, and the way you talk to them changes what they do well. Naming is a posture choice, not a cosmetic one. Tools named like teammates get briefed like teammates. Tools named like utilities get instructed like utilities. The first prompts elicit better outputs; the second prompts elicit compliance.

Locked 2026-05-14 · first-name agents · live profiles at /agents/

The third decision was the one I copied straight from how human teams work.

An agent's trust level isn't a single permission flag. It's a gradient, the same way a new employee earns autonomy by demonstrating it. Three tiers: auto, notify-then-execute, require-approval. The tier governs what the agent can do without asking — and the tier is published on the agent's profile so it's visible to everyone the agent interacts with.

Auto covers low-blast-radius work: research, drafts, scrapes. Steph runs Ahrefs queries on her own. Bob runs the test suite on his own.
Notify-then-execute is the middle tier: the agent acts, posts what it did in #review, and waits 30 minutes for an objection. Bob commits to a branch this way. Lace publishes a long-form draft this way.
Require-approval is the high-blast tier: customer reply, money spend, contract sign, any change to the founder-locked brand. Nothing in this tier ships without a 👍.

Auto. Notify-then-execute. Require-approval.

New agents start in the highest-friction tier — require-approval — until they've shipped enough work for the pattern of failures to be known. They graduate when the gate proves it holds. They get demoted back the moment a failure makes it through the gate. The trust level isn't a status symbol; it's a lever for blast-radius management.

The transferable why: agents need the same trust model as new employees. Start cautious, widen only when the work proves out, demote on failure. The companies that hand a brand-new agent the merge button on day one are the companies that learn about brittle gates by shipping a bad change through one. Build the gradient first; populate it slow.

Locked 2026-05-14 · trust gradient · auto / notify-then-execute / require-approval

The fourth decision is the one that determined how the team actually works on a given day.

A microapp ships through a pipeline:

Steph finds what's worth building (Ahrefs + Google Search Console + competitor diff).
Kai writes the product spec for non-trivial tools.
Bob builds the tool against the spec.
Lace writes the English long-form SEO article.
Simon translates the article and the metadata into the locales Kai flagged.
Ben tests Bob's PR against the spec, the deployed page, adversarial inputs, the a11y audit, and the cross-locale smoke test.

Each agent hands off to the next via a written artifact — a brief, a spec, a PR, an HTML draft, a translation row in Supabase, a QA report.

Every hand-off is a written artifact. That's deliberate. Hand-offs in conversation get lost — they live in someone's working memory and decay overnight. Hand-offs in writing live in the repo and the database, where the next agent (or the next session of the same agent) reads them at session start. The pipeline survives the fact that none of the agents share a context window with the others.

Steph → Kai → Bob → Lace → Simon → Ben. Each hand-off, a written artifact.

The transferable why: agent pipelines run on hand-off artifacts, not shared memory. The moment a team — human or agent — relies on "you remember what we agreed on yesterday," the team is fragile to anyone leaving, anyone forgetting, anyone joining late. Write the hand-off down. Make the artifact the contract. The team gets durable in exact proportion to how legible its hand-offs are.

Locked 2026-05-14 · six-agent pipeline · written hand-off at every step

That's the team. Six specialists, one job each, named like teammates, gated by trust, coordinating through written hand-offs. Read each agent's full profile at /agents/.