Part III · Chapter 6 of 7

The Agent OS

How a one-human-orchestrator company runs on agents. Five layers, four memory types, four-phase build.

28 min read

01 — Overview

One human. Many agents. One operating system.

The target state is a company where one human (the orchestrator) sets direction and reviews; everything else — engineering, marketing, customer support, ops, planning — is an agent. This document is the architecture for the operating system that makes that possible.

Five layers: command surface (where you give orders), coordination plane (tasks + events + decisions + memory), agents (the workforce), action surface (the external world they touch), and a write loop that flows actions back into memory so the org gets sharper every week.

02 — The North Star

Five layers, one feedback loop.

The architecture's defining property isn't any single component — it's the read-act-write loop. Agents read from the coordination plane, act on the world, and the actions become events that flow back into memory. Without that loop, agents stay frozen at whatever you seeded them with. With it, month two is meaningfully smarter than month one.

              YOU (orchestrator)
                    ↕
            [ COMMAND SURFACE ]    Slack mobile + Claude Code
                    ↕
            [ COORDINATION PLANE ] Tasks + Events + Decisions + Memory
                    ↕
                AGENTS (many)
                    ↕
            [ ACTION SURFACE ]     GitHub + Slack + Email + Calendar
                    ↕
          (events flow back into coordination plane → memory grows)

The point of the loop

Storage isn't the bottleneck — discipline is. Every agent action becomes an event. Without that, you have a filesystem of noise. With it, you compound.

03 — Four memory types

Each kind of memory has a different home.

From the MemGPT taxonomy (UC Berkeley, now standard). Treating "memory" as one thing is the failure mode that makes most agent systems get worse over time, not better.

Working — current task RAM. Vanishes after the task. Lives in process memory and conversation context.
Episodic — what happened. Append-only. Lives in the events table, git history, Slack logs, and (today) data/overnight-logs/*.jsonl.
Semantic — what's true. Curated, slow-changing. Lives in BRAND.md, AGENTS.md, and the future decisions/ directory.
Procedural — how to do things. Lives in .claude/skills/ as agent definitions and templates that agents execute directly.

Hygiene > storage

The hard part isn't where to put memory — it's keeping it sharp. Quarterly pruning. Always include the why. Ask "is this surprising or non-obvious?" before saving anything.

04 — The read path

How agents pull what they need to know.

Conversations, code, Slack messages, emails, run logs all feed an ingestion layer. The ingestion layer fans out into a vector DB (semantic search), a knowledge graph (relationships), and structured tables (queryable facts). A retrieval layer — exposed as MCP / SDK / REST — sits on top. Agents call it before they act.

Today the read path is shallow: agents read state.json, BRAND.md, and their own SKILL.md. Phase 1 widens it to the tasks/events/decisions tables in Supabase. Phase 2 layers Mem0 on top for semantic retrieval.

05 — The write path

How learning compounds.

Every meaningful agent action produces an event: PR opened, decision made, customer replied, build failed. Events go through the same ingestion layer in reverse — they become the new memory the next agent reads. The system gets sharper without a single line of code change.

The flywheel

Read path keeps agents informed. Write path makes them learn. Command surface is how YOU stay in the loop without becoming the bottleneck.

06 — Command surface

Slack mobile is the cockpit.

Three properties matter: persistent (survives across sessions), phone-accessible (orders from anywhere), async (you write, agents act, you check back). Slack hits all three. Free tier. Phone/web/desktop. Built-in conversation history. Each agent gets a Slack bot user.

Claude Code stays as the interactive interface for code-heavy work — what you're reading right now is part of that loop. Slack becomes the always-on operations layer for everything else.

Channels mirror functions, not departments. #planning is where you give orders. #engineering is where Bob and friends report. #ops is where the broadcaster posts decisions and the digest agent drops weekly summaries. #review is the 30-minute approval window for medium-trust actions. #firehose is muted-by-default; everything goes there for debug. #humans-only is your private notes — no agent reads it.

07 — Workflows

Sequences of agents, stored as markdown.

A workflow is a chain of agent invocations with state, conditions, and rollbacks. The overnight-builder is one (Steph → Kai → Bob → Lace → Simon → Ben → merge or fix-loop). Future workflows: customer onboarding, monthly strategic review, marketing launch, incident response, refund processing.

They're stored the same way agents are — as markdown files with YAML frontmatter — under .claude/workflows/<name>/SKILL.md. The frontmatter declares the step graph; the body explains intent. A small runner reads the frontmatter and dispatches via the existing agent CLI.

---
name: overnight-builder
trigger: cron, slack-mention, manual
steps:
  - { agent: steph, on_first_run_only: true }
  - { agent: kai, when: "needsSpec == true" }
  - { agent: bob }
  - { agent: lace }
  - { agent: simon }
  - { wait_for: cf-preview }
  - { agent: ben }
  - { action: merge, when: "ben_verdict == PASS" }
  - { action: fix_loop, max_attempts: 2, when: "ben_verdict == FAIL" }
state_table: tasks
---

Don't bring in Temporal

At solo scale, the markdown-plus-frontmatter pattern composes with everything else (skills, decisions, state). Temporal / Airflow / n8n solve a problem you don't have yet. Add complexity only when the simple pattern actually breaks.

Triggers fan in from many sources — a Slack message (@router run weekly-digest), a cron firing, a GitHub PR webhook, a customer email arriving. The workflow runner is agnostic to where the trigger came from; it just needs the workflow name and an initial context payload.

08 — Coordination plane

The single source of truth.

Three Supabase tables form the substrate. tasks tracks anything an agent (or human) is working on, with statuses, owners, and parent links for sub-tasks. events is the append-only audit log — every significant action lands here, including the orchestrator's own state transitions. decisions mirrors a decisions/ directory in the repo so ADRs are searchable from the same plane.

Decisions follow the decision-as-event pattern: each decision is a markdown file with frontmatter declaring who it applies to and where to broadcast it. Committing the file fires a webhook; the broadcaster fans out to Slack, the weekly digest, email, and any agent skill files affected.

---
id: 2026-05-08-torque-converters-common-values
applies-to: [bob, lace]
broadcast: [slack-engineering, weekly-digest, email-daniel]
---
From now on, all torque converters require a "common values" reference table.

Memory is repo-first today: BRAND.md, AGENTS.md, .claude/skills/, data/specs/. Phase 2 layers Mem0 on top via MCP — agents query for "what do we know about X" and get semantically-relevant prior decisions back.

09 — The agents

Seven today. More coming.

Each agent is a small, focused skill — a markdown file in .claude/skills/<name>/SKILL.md describing its job, its refusal rules, and its archetype-template references. Agents compose: the overnight-builder orchestrator drives the engineering pipeline by calling Steph → Kai → Bob → Lace → Simon → Ben in sequence.

Steph — research scout. Pulls Ahrefs + GSC + competitor diff to find high-volume / low-difficulty tool ideas.
Kai — spec writer. For non-trivial tools, studies the SERP and writes a spec to data/specs/<slug>.md.
Bob — tool builder. Generates the React widget, seeds Supabase, runs the gate, opens the PR.
Lace — SEO writer. 1500–2500 word long-form article published to tool_metadata.seo_article.
Simon — translator. Translates metadata + article into the locales Kai's spec marked.
Ben — QA. Tests the live preview against the spec; posts PASS/FAIL on the PR.
Overnight-Builder — the orchestrator that drives the whole sequence.

Phase 3 adds external-facing agents: Marketing (drafts launch posts), Customer Support (Gmail + Slack DMs), Calendar (scheduling), Finance (Stripe + reconciliation). Each gets its own GitHub service account, Slack bot user, and email signature so external interactions are clearly attributed.

10 — Agents as characters

Personality first. Function second.

Functional bots get used. Lovable characters get followed. Companies are proving this now: Replika built a billion-dollar business on agent-as-companion; Character.AI runs on millions of user-created personalities; Klarna's customer-service AI has a name, a face, and a consistent voice; Microsoft's Clippy was the OG. The cultural priors exist (Wall-E, Baymax, R2-D2, Severance's work-personas). Audiences love characters. Most agent products haven't figured this out — that's the edge.

In this architecture, every agent has both a skill (what it does) and a character (who it is). Personality is a separate file, loaded BEFORE the skill on every invocation. Function flows through, flavored.

What every character bible holds

Trait	How agents get it
`Voice`	Distinct vocabulary, rhythm, banned words, catchphrases. Sample outputs in voice-samples/ to anchor consistency.
`Motivation`	One sentence: what makes this agent get out of bed? "Bob lives for the moment a tool ships."
`Flaws + growth`	Tracked in STORY.md, appended over time. Bob v1 was reckless; v2 is meticulous post-incident; v3 is opinionated.
`Catchphrases`	Things they always say. Bob ends every PR with a baking metaphor. Steph opens with KD numbers.
`Visual identity`	Custom avatar (illustration in brand style), distinct color, emoji vocabulary, signature font.
`Relationships`	Cross-agent dialogue. Ben roasts Bob; Bob respects Ben; Steph teases Lace's word counts. Banter wins audiences.
`Backstory`	Made-up but consistent. Where did the agent come from? Lore constrains decisions in the same direction over time.
`Arcs`	Documented as the agent grows. Audiences watch characters evolve — that's the long-running engagement.

Architecture additions

The skill directory grows from one file to a small bundle:

.claude/skills/bob/
├── SKILL.md         (what Bob does)
├── CHARACTER.md     (who Bob is — voice, motivation, banned words)
├── STORY.md         (notable moments, growth, arcs — appended over time)
├── avatar.svg       (visual identity)
└── voice-samples/   (prior outputs to stay consistent with)

Where each piece lives long-term

Treating the bundle as all-markdown is a Phase-1 simplification. Long-term, different files want different homes — the same split as the broader memory architecture: procedural in the repo, episodic in the database, binary assets on a CDN.

File	Long-term home	Why
`SKILL.md`	Repo	Code-shaped rules, evolves with PRs. Stays in repo forever.
`CHARACTER.md`	Repo (or Notion if non-devs edit)	Slow-changing identity, atomic with skill changes.
`STORY.md`	Supabase agent_stories table	Append-only, queryable by date/topic, grows unbounded.
`voice-samples/`	Supabase + pgvector	Vector search — "find Bob's most direct error message".
`avatar.svg`	Cloudflare R2 + CDN	Edge-cached image delivery; not text.
`agents (new table)`	Supabase day-one	External identity: Slack bot ID, GH service account, trust level. Has to be queryable from the router.

But don't migrate now. Repo-as-truth scales 6+ months at solo scale. Move STORY.md to Supabase when entries cross ~50 per agent (querying breaks down before that). Move voice-samples/ to pgvector when the voice gate genuinely needs semantic search (it doesn't yet). Move avatars to R2 when public agent profiles launch. The agents table is the only one that should start in Supabase day one — the router needs to look up "who is Bob?" the moment Phase 1 ships.

One pattern, repeated

Procedural = repo. Episodic = database. Binary = CDN. External identity = database. Same shape as the memory taxonomy in section 03 — agent bundles aren't a special case, they're a microcosm of the broader architecture.

A voice gate runs on every outbound message: "does this sound like Bob?" If no, regenerate. Slow at first, fast once tuned. Same shape as Ben's QA gate but on tone instead of correctness.

The brand arc — eventual

Long-term, each agent earns its own surface area: an X account (Bob tweeting build progress, Steph tweeting trend spots), a newsletter, podcast appearances, maybe merch. Cross-promo across agents (Bob retweeting Steph's research finds). The Microapp differentiator becomes recognizable agents — Bob, Steph, Lace, Simon, Ben, Kai — that audiences follow because they like them. That's the brand moat that's hard to copy.

11 — Action surface

Where agents touch the world.

Five integration surfaces: GitHub (code + PRs), Slack (chat), Email (Gmail MCP for read; Resend for send), Calendar (Google Calendar MCP), and Customers (eventually the public-facing tools themselves). Each surface has typed wrappers so agents don't reinvent connectors.

12 — Four-phase roadmap

Build only what you outgrow.

Phase 1 (weeks 1–4) — MVP

Slack workspace + bot identities. tasks / events / decisions tables in Supabase. decisions/ directory in repo with ADR template. Broadcaster Worker (fans new decisions out to Slack + email + skill PRs). Router bot (parses your Slack messages, dispatches to agents). Webhook receivers on Cloudflare Workers (GitHub PR events, Slack events, cron). Migrate the overnight-builder onto the new substrate. Cost: ~$5/mo.

Phase 2 (months 2–3) — Memory + observability

Layer Mem0 on top of repo memory once the decision pool crosses ~50 entries. Weekly digest agent (cron). Editor agent for memory hygiene (proposes consolidations as draft PRs you approve in the morning). Trust gradient enforcement (auto / notify-then-execute / require-approval). Rollback lever: @router rollback last 6h.

Phase 3 (months 4–6) — External-facing agents

Marketing agent drafts launch posts when tools merge. Customer support agent reads inbound email + Slack DMs from external users; drafts replies, applies trust level, sends. Calendar agent negotiates meeting times via email. Finance agent reads Stripe + bank, reconciles against tasks, drafts monthly summaries.

Phase 4 (months 12+) — Full org

Multi-project Mem0 (cross-project shared memory). Customer-facing agents promoted to auto trust where track record warrants. Self-improvement loop: editor agent proposes skill edits weekly based on observed failure patterns; you approve or reject the diff.

13 — Hosting + cost

Cheap to start. Scales linearly.

Cloudflare Workers (paid plan, $5/mo) hosts the always-on bits: webhook receivers, the Slack router bot, the broadcaster, scheduled jobs. Supabase (existing, free tier covers Phase 1) holds tasks/events/decisions. Slack free tier is enough for solo + small team. Resend covers 10k emails/mo free. Total Phase 1: ~$5/mo.

Phase	Adds	Total / mo
`Phase 1`	+$5 Cloudflare Workers	$5
`Phase 2`	+$9–29 Mem0	$14–34
`Phase 3`	+$8 Buffer (social), Resend volume	$22–42
`Phase 4`	Variable inference scale	varies

14 — Trust + rollback

The tier you set is the tier you live with.

Companies running on agents fail on trust before they fail on infrastructure. Three tiers, with a rollback lever as a safety net.

Plus: @router rollback last 6h reverts (where possible) opened PRs, deleted branches, posted retractions. Some actions are irreversible (sent emails, customer replies) — those are flagged for manual cleanup. Every event captures a "how to undo" hint when it's written, so the rollback can replay reverse-order.

Calibration is week-by-week

You'll spend the first month tuning what's auto vs notify-then-execute. Expect 1–2 incidents where an agent does something unexpected — that's the cost of learning the right gradient for your org.

15 — Conversation patterns

Three shapes of talking to the agents.

Three interaction patterns drop naturally onto the same Slack-plus-memory substrate. Each is a different access shape on the same coordination plane.

1. DM — one-on-one, ad hoc

Slack DM with any agent bot. You: "@bob how's the queue?" Bob queries the tasks table, replies with in-flight + queued + blocked. No meeting, just a quick question. Phase 1 supports this on day one — every agent gets its own bot user, every DM lands in the same coordination plane.

2. Daily standup — scheduled, multi-agent

A cron-fired @digest agent runs at 8 AM. Reads the last 24h of events, queries open tasks per agent, generates a unified post in #ops: yesterday's shipped work, today's queue, blockers. Each agent can thread-reply with detail. Phase 2 work — same shape as the weekly-digest pattern, just a shorter cadence.

3. Strategic meeting — long-running, decision-producing

You open a thread in #planning: "Q3 priorities discussion." The router tags it meeting:q3-strategy. Each agent reads its corner — Bob looks at the backlog, Steph at SERP opportunities, Ben at recurring failure patterns — and posts their input as a thread reply. You ask questions, agents respond. At the end you say "@router log decisions" and a meeting-summary agent extracts the decisions, writes ADRs to decisions/, and the broadcaster fans them out.

What this needs beyond Phases 1–2

Phase 4 work. Two specific additions: a multi-agent coordination protocol (who reads what, who replies when in a shared thread), and a chair agent that keeps the conversation on track and ensures decisions actually get logged. Frameworks like AutoGen and CrewAI ship reusable patterns; DIY is also viable for solo scale.

The first two patterns work today on the architecture's MVP. The third is a stretch — worth piloting only after a few months of muscle memory with the simpler patterns. Most solo founders running this haven't formalized strategic meetings yet; they treat decisions as Slack threads where they tag in agents one at a time, not as structured meetings.

16 — Built vs coming

What exists today vs what's next.

Done (shipped + working)

Seven agent skills: Steph, Kai, Bob, Lace, Simon, Ben, Overnight-Builder
Overnight-builder pipeline (state machine + scheduled wakeups)
Procedural memory: .claude/skills/
Semantic memory: BRAND.md, AGENTS.md, data/specs/
Episodic logs: data/overnight-logs/*.jsonl
Build pipeline: Astro SSG + Cloudflare Pages + Supabase

Coming (Phase 1)

Slack workspace + router bot + agent identities
tasks / events / decisions tables
Decision-as-event broadcaster (Cloudflare Worker)
Webhook receivers (GitHub, Slack, cron)
Migration of overnight-builder onto the new substrate
.claude/workflows/ directory + workflow runner
CHARACTER.md + STORY.md per agent (skeleton — content fills in over time)

Future (Phases 2–4)

Mem0 semantic memory layer
Weekly digest + editor agents
Trust gradient + rollback
Voice gate (per-agent tone consistency)
External-facing agents (marketing, support, calendar, finance)
Per-agent public surfaces (X account, newsletter, merch)
Self-improvement loop (skill edits proposed by agents)
Strategic-meeting chair agent + multi-agent coordination protocol

17 — Closing

The point isn't the architecture.

The point is the operating philosophy. The infrastructure is well-trodden — Slack + Cloudflare Workers + Supabase + MCP integrations is a solved stack. What's not well-trodden is your operating discipline: what gets agent-handled vs human-only, what trust gradient feels right, how often you review.

Build the smallest version. Run it for a month. Redesign. Don't try to design the final state on day one.

Read first

BRAND.md is the org's semantic-memory ground truth. AGENTS.md is the procedural-memory ground truth. Every agent should read those before doing anything. The Agent OS layer above them is the substrate that lets the rest of the org run.