The first version of the agent operating system was a folder of SKILL.md files and a human typing slash-commands.
It worked. It also concealed the load-bearing infrastructure question: what does it take to run a company with a team of agents instead of a team of humans? Agents don't share a Google Workspace, don't show up to a standup, don't keep their working memory in their heads between sessions. A company built on agents needs a different operating substrate from a company built on people — not a dashboard, not a Slack channel, not a project board, but a layered set of systems that handle the things humans do for a human team without thinking.
The substrate is five layers stacked on top of each other. The orchestrator — me, the one human, deciding what gets built. The command surface — how I tell agents what to do. The coordination plane — tasks, events, decisions, memory. The agents — the team, each with one job. The action surface — where the agents reach out and affect the world: GitHub, Slack, email, calendars, customers. Events flow up from the action surface back into the coordination plane; memory grows; the next agent reads a richer context.
Orchestrator → Command → Coordination → Agents → Action. Events loop back.
The thing the layered stack solves that a flat "agents in Claude Code" setup doesn't is the persistence problem. An agent's working memory dies at session end. A Slack channel is a partial fix for ephemeral notifications. A database is the only durable substrate. The operating system has to own the durability — the agents are the workers; the coordination plane is the office.
The transferable why: companies built on agents are not just companies with extra tools. The operating substrate is what makes the agent team durable, observable, and recoverable. A flat-tool setup works for a single agent doing a single task; it breaks the moment the team has to hand off work, remember decisions, or recover from a failed step. Pay the infrastructure cost up front; the agent layer gets useful in proportion to the substrate that backs it.
Locked 2026-05-14 · five-layer agent operating substrate
The second decision was the surface the orchestrator uses to command the team — and the answer is the surface the orchestrator is already on. Slack on a phone.
The candidates were the usual three. A custom dashboard — clean, single-purpose, ten minutes to build a wireframe and three months to build the real thing. A CLI — fast, scriptable, useless on a phone. Slack — already on every screen the orchestrator looks at, reply-from-thread, mobile-callable, free for small teams. The choice was obvious once I framed the question right: which surface am I on for sixteen hours a day already?
The Agent OS runs on Slack workspaces with five channels — #planning (Steph proposes, the orchestrator approves), #engineering (Bob commits, Ben reviews), #ops (decisions, broadcasts), #firehose (everything, in case I want to scroll), #humans-only (the place I write to myself). Each agent has a bot identity, a Twitter-handle profile, a written voice that shows up in messages. Replying to an agent in-thread is a command. The phone is the primary device; the laptop is a secondary one.
Slack channels. Phone-first. Reply-in-thread.
The benefit lands the day the orchestrator is on the subway and Steph proposes a tool worth building. The orchestrator types "yes, route to Kai" in the thread. Steph publishes the brief; the router picks it up; Kai's session fires; the work moves. No laptop required. No login flow. No special command syntax — just the same Slack conventions every human team already uses to coordinate work.
The transferable why: the right command surface is the one the orchestrator is already using. Building a custom dashboard for a one-person company is the act of building a tool to use the tools you already have. Slack, email, the existing project board — pick what's already in the orchestrator's hand and ride the ergonomic that's already built. The novelty of the operating model doesn't excuse paying for a custom UI on top of it.
Locked 2026-05-14 · Slack as command surface · five channels · phone-first
The third decision was the model for agent memory — and the answer came from the MemGPT literature.
An agent without memory restarts every session. The first sessions of the early agents were exactly this — Steph proposed the same tool three times in a week because she didn't remember she'd already proposed it. The fix isn't a single memory store; it's a memory taxonomy. Four types, each in a different storage tier, each with a different lifetime and access pattern.
Working memory is the agent's current session context. It dies at session end. It lives in-process, ephemeral, only the current invocation can see it. "I'm in the middle of building the binary decoder; here's the state of the test gate."
Episodic memory is what happened across sessions — events, dates, attribution. It lives in the events table in Supabase and in the git history. "Bob built the binary decoder on May 8. Ben passed it on May 9. It shipped on May 10."
Semantic memory is the facts about how the company works — brand decisions, pricing, conventions. It lives in BRAND.md, AGENTS.md, the chapters of this book, and the decisions/ directory. "We don't have a Pro tier. Free is a fact, not a slogan. AmEx not airline."
Procedural memory is the agent's own skill — how it does its job. It lives in the agent's SKILL.md, the checklist.md, and the workspace files. "Bob's pipeline: validate → seed → build → gate → PR."
Working. Episodic. Semantic. Procedural.
Each memory type has the storage that fits its lifetime. Working is ephemeral; episodic is append-only events; semantic is text the orchestrator edits; procedural is text the agent edits via PR. The agent reads the relevant slice at session start — current working state plus relevant episodic plus the canonical semantic plus its own procedural — and the session is grounded in the full context of the company without the agent having to discover it from scratch.
The transferable why: memory isn't one thing; it's four things, each with different properties. Treating it as one thing — "a vector database" or "a Notion wiki" — produces a system where the wrong thing is in the wrong place: ephemeral notes pollute the canonical knowledge base, brand decisions get treated as conversational context, the agent's own procedure is mixed with the company's policy. Pick the taxonomy; route each memory to its tier; the agent system gets predictable in exact proportion to how cleanly the four memory types are separated.
Locked 2026-05-14 · four memory types · MemGPT taxonomy adopted
The fourth decision was about the coordination plane — the three Supabase tables that hold everything the agents share.
The first version had one table — tasks. Every piece of work the agents did was a row. Status, assignee, payload, timestamps. The schema worked until the second week, when "the history of what happened" became as important as "what's open right now." Mutating the tasks row destroyed the history. Auditing the history became impossible. The schema needed to split.
Three tables. tasks holds work-in-flight: what's open, what's assigned, what the current state is. Mutable, holds the present. events is append-only: every state transition, every agent action, every milestone. Immutable, holds the past. decisions is the policy layer: the markdown files in decisions/ mirrored into the database for queryability, with their frontmatter parsed out. The chapter on decisions covers the format; the agent-os uses them as the policy substrate every agent reads at session start.
Tasks (now). Events (then). Decisions (always).
Each table has a Worker fronting it. agent-os-router takes Slack messages, mutates tasks, writes to events. agent-os-broadcaster watches decisions/, fans new decisions to Slack channels, writes events. Every read by an agent is filtered through the three tables — the current task, the relevant recent events, the applicable decisions. The agent never reads raw Slack; the coordination plane translates between the human-facing channel and the agent-facing context.
The transferable why: mutable state and immutable history are two different things and want two different stores. Conflating them — keeping only the current state and losing the trail — produces systems you can't audit, can't recover, and can't reason about retrospectively. Append-only event streams next to mutable state give you both. The cost is a slightly more complex schema; the benefit is recoverability and explainability across every system that reads from it.
Locked 2026-05-14 · tasks + events + decisions · Supabase tables fronted by Workers
The last decision is the one that staged the build over a year. The four-phase roadmap.
Building the agent OS as one monolithic project would have taken six months of dark-build with nothing shipping in between. The version that survives breaks the build into four phases, each one valuable on its own, each one earning the right to start the next.
Phase 1 (weeks 1-4): the MVP. Slack workspace + bot identities. Tasks, events, decisions tables in Supabase. The decisions/ directory in the repo with the broadcaster Worker watching it. The router Worker fielding Slack commands. Migrate the existing overnight-builder onto the new substrate. Cost: ~$5/month. End state: a working Agent OS that's still Claude-Code-triggered but everything else is automated.
Phase 2 (months 2-3): memory and observability. A Mem0 layer for semantic memory at scale. A weekly-digest agent that summarizes the events table for the orchestrator. An editor agent for memory hygiene. Trust gradient enforcement at the policy layer. A @router rollback last 6h lever. Additional cost: $9-29/month. End state: the agent team can be reasoned about and recovered from.
Phase 3 (months 4-6): external-facing agents. A marketing agent that drafts launch tweets and posts. A customer support agent that handles Gmail and Slack DMs. A calendar agent for scheduling. A finance agent for Stripe reconciliation. The agent team grows past the build pipeline into the rest of the company's operations.
Phase 4 (months 12+): full org. Multi-project Mem0. Customer-facing agents promoted to auto trust on classes of decision that have proven out. A self-improvement loop where agents propose edits to their own skill files weekly. The company runs without me having to be the cron.
Four phases. Each one ships value alone.
The transferable why: infrastructure roadmaps that ship value at every phase are the ones that don't get cancelled mid-build. Monolithic plans require persistent commitment across a year of work; phased plans earn their continuation one phase at a time. Phase 1 has to be valuable on its own — otherwise the project is staking everything on Phase 4, and most projects don't reach Phase 4. Make the first phase the one that pays for itself; let the rest earn their place by extending the value.
Locked 2026-05-14 · four-phase roadmap · Phase 1 is the MVP that has to pay for itself
That's the Agent OS. A five-layer substrate, Slack as the command surface, four memory types, tasks-events-decisions as the coordination plane, four phases of build. The orchestrator's job becomes setting the spec; the OS does everything between the spec and the shipped tool.