AGENT2026-04-18· 14 min· By Michael Saad

Otto: the autonomous growth agent we stopped babysitting.

Before Otto ran a single production task, we documented exactly what he was allowed to do. 71 governance documents. A stress test that surfaced 8 critical issues. The decision to strip his orchestration authority before it became a problem. The order matters more than almost anything else in this stack.

Before Otto ran a single production task, we made a decision that took longer than building him: we documented exactly what he was allowed to do.

Not as a formality. As the prerequisite for trusting anything that runs without a human watching it.

Most people building AI agents skip this step. They ship fast, monitor loosely, and treat governance as something to layer on later. We did the opposite. The governance came first. The production runs came after. The difference between those two approaches is the difference between an agent you trust and one you babysit, and the distance between them is larger than most people realize until they are standing in it at 2 AM trying to figure out why their system went dark.

What we actually built

Otto is one agent inside a six-agent operating system.

Joan handles intake. Every inbound communication classified and routed before a human touches it. Peg is the gate. Nothing moves to execution without validation. Hello runs CRM operations. Atlas runs SEO and measurement. Pulse monitors system health and produces the operational summaries that keep the Console informed. Otto is growth execution.

Beneath those six sit four support agents: a Verifier for QA, a Documentation Agent maintaining institutional memory, a Strategy agent, and a Content agent. Above all of them is the Console, the human authority layer that approves execution, resolves escalations, and owns every decision involving budget, client relationships, or contractual exposure.

These were built together. Not as a roadmap. As a system, because a system requires every part to exist before any part can be trusted. An isolated agent is a tool with good marketing. An agent operating inside a defined authority structure with other agents and a human governance layer is something different: a component of an operating system that can be held accountable and corrected when it drifts.

Getting to that required 71 documents before the first production run.

71 documents

Seven core governance files: the system constitution, operating law, inter-agent message schema, eight-class error protocol, real-time task tracking, decision log, and agent changelog. Sixty-four files in the agent masters directory: ten full agent definitions with system prompts, communication rules mapping every permitted and forbidden path between agents, iteration loops for quality and scope change with anti-patterns documented explicitly, security protocols for kill-switch and breach response, operational specs for the nightly loop and daily scrum, seven canonical data envelopes, universal prompts for failure handling and self-health reporting, and resource management governing model routing and cost tracking.

That is not overhead. That is the thing that makes the system work six months later when no one remembers why a particular constraint exists. The constraint exists because the document says so, and the document says so because a failure mode was identified and closed before it could surface in production.

We also ran a stress test before deployment. Eight critical issues surfaced. Otto's SOUL file had seventeen conflicts, the highest count of any agent. The version of Otto that went into production is not the one that was originally specced. The specced version had orchestration authority over other agents. That version would have been a liability.

Why we stripped his authority

Orchestration authority sounds like a feature. Give the most capable execution agent the ability to direct others and you get a more powerful system.

What you actually get is a single agent with unclear boundaries making decisions that belong to the Console. An agent that can assign work to other agents can also make mistakes that compound across the system before anyone notices. An agent that escalates to the Console forces a human to see every decision that matters.

Otto lost orchestration authority in the Session 2 reconciliation. It was the right call. The version of Otto that runs production today knows exactly what he can do, exactly what he must escalate, and exactly what he cannot do regardless of instruction. That clarity is why he can run without someone watching him.

What Otto actually does

Growth execution, specifically against Digital1010's own accounts first.

This is the dog food principle applied deliberately. Otto runs Google Ads for D1010 directly. Console-approved budget, real spend, real accounts across Google, Meta, and LinkedIn. He drafts copy, structures campaigns, monitors performance, and surfaces findings through Pulse. Every pattern validated on D1010's marketing is a pattern we have actually tested before it touches a client.

He also produces autonomous end-of-week executive reporting. Every Friday, without prompting, Otto pulls performance data, structures it against the prior week's baseline, flags anomalies, and delivers a summary to the Console for review. A human reads every one before it goes anywhere. But that human is editing and approving, not starting from a blank page.

The productivity unlock is not that the agent does the work. It is that the agent does the first draft at a quality level where a senior person's time is well spent in review rather than production.

Two incidents that proved the system works

Six weeks into production, Mission Control went dark. Twenty-six cron jobs running simultaneously inside OpenClaw exhausted API quota in under four hours. Michael was on his phone. He SSHed into the Mini via Tailscale, diagnosed the issue, and started recovery from a mobile device. The infrastructure decision to run Tailscale, which had seemed like belt-and-suspenders at setup, was the only reason same-day recovery was possible without physical access to the machine.

No client impact. All internal. The runbook that came out of it is not exciting reading. It is the difference between a four-hour incident and a forty-five minute one.

The second incident is the one that matters more.

In early March 2026, Otto started making tool choices that didn't match his trained behavior. Subtle, not dramatic. The kind of drift easy to write off as a bad session. Mission Control's observability layer was logging every action against a frozen February baseline, so we had a record instead of a vibe. The content pipeline, the separate Orion/Nova/Lyra/Aegis/Calliope system that runs client content production, started producing shallower drafts around the same time. Aegis, the QA agent, was catching more issues. We read it as Aegis improving. It was upstream degradation.

Quota burn climbed faster than session volume justified. Mission Control flagged the anomaly, automatically rotated affected workflows to fallback providers, and kept client deliverables moving while we diagnosed.

The cause took weeks to confirm. Anthropic had deployed changes in late February that degraded model intelligence. Ablation testing later confirmed a 3% drop across Claude Code, the Agent SDK, and Claude Cowork. They reverted April 20.

For two months, developers across the industry were reporting erratic behavior and being told the model was fine. We had a documented before-and-after record because we built the observability baseline before we needed it. That is the only time it is useful. Before.

Where it goes from here

Otto is one proof point inside an architecture that is now evolving toward the skill-centric model. Skills as the durable knowledge layer, agents as thin execution layers that run them, the Operations Dashboard as the task router between them.

In that model, Otto does not go away. He becomes more focused: an execution engine drawing from a versioned skill library rather than carrying knowledge internally. The dog food campaigns, the autonomous reporting, the escalation discipline. All of that carries forward. What changes is that the knowledge informing those tasks becomes auditable, versioned, and shareable rather than locked inside one agent's configuration.

That is the direction. The foundation that made it possible to go there is the 71 documents, the four reconciliation sessions, the stress test, and the decision to strip orchestration authority before it became a production problem.

Build the governance first. Ship the agent second. The order matters more than almost anything else in this stack.

Want to apply this?

Run an AEO Scan against your own stack.

Free written read of your visibility across ChatGPT, Claude, Perplexity, and Google AIO in 24 hours. Same diagnostic we run on every new engagement.