GROWTH2026-04-24· 8 min· By Michael Saad

I Called the Claude Regression. Anthropic Just Confirmed It.

I diagnosed three Claude regressions on April 15. Anthropic confirmed them today. Here's the routing framework, harness engineering thesis, and 4-tier infrastructure Digital1010 had already built to survive it.

I Called the Claude Regression. Anthropic Just Confirmed It.

By Michael Saad, Digital1010 · Published: April 23, 2026

I don't run Digital1010 on Claude. I run it on a 4-tier model routing system where Claude is one of the seven tools we use. That distinction is the entire story of how Anthropic's three silent regressions, confirmed in their postmortem this morning, hit our agency differently than they hit the rest of the industry.

Eight days ago, on April 15, I published a forensic analysis showing that Claude had gotten measurably worse. The evidence was brutal: 6,852 Claude Code session files, 234,760 tool calls, a 67% drop in thinking depth since late February. We named the reasoning_effort levels Anthropic had never publicly documented. We connected the pattern to the forgetfulness and task abandonment developers were seeing across GitHub, X, and Reddit.

Today Anthropic published their own postmortem and confirmed three specific regressions: a default reasoning-effort downgrade from March 4 to April 7, a caching bug from March 26 to April 10 that caused Claude to forget its own reasoning mid-session, and a verbosity system prompt from April 16 to April 20 that capped responses at 25 words between tool calls. Each one hit a different slice of users on a different schedule. Stacked, they produced exactly the erratic, forgetful, lazy agent behavior the community had been describing for weeks.

This is not a news recap. There are already forty of those. This is the story of what Digital1010 had already built to handle exactly this kind of silent vendor degradation, the three weeks of receipts that led up to our April 15 diagnosis, and why the Anthropic postmortem is a validation not of our reporting but of our infrastructure thesis.

March 23: The Framework, Before Anyone Called It A Problem

A month ago, I published the routing framework Digital1010 uses across every AI workflow in the shop. The title was "We Use Seven AI Tools at Digital1010. Here's the Routing Logic Behind Every Decision." The thesis, in one line: the model is a variable, the routing decision is the skill.

We route across Claude, ChatGPT, Claude Cowork, Cursor, LM Studio, OpenClaw, and a handful of other specialized tools. Each one gets used where it's best, and nothing gets used exclusively. The framework was not a reaction to any specific vendor incident. It was a structural position. No single lab is going to ship a flawless product forever, and no agency that depends on a flawless product forever is going to stay in business.

The day after, I published a companion piece: when to use Claude Code versus Codex versus OpenClaw. The tradeoff was explicit. Claude Code wins on quality and complex reasoning. Codex and Cursor win on velocity. OpenClaw wins on autonomous operations where a human isn't going to be in the loop. If you pick one and commit to it for everything, you've accepted the failure mode of that one tool as your own operational risk.

That framework is what the Anthropic postmortem validates. Users who bet everything on Claude Code experienced the last six weeks as an unexplained decline. Users who routed across multiple tools experienced it as one vendor slipping while the others held the line.

April 7 to 14: The Cost Crisis That Forced Tier 4

On April 7, I burned $55 in a single day running Sonnet on Slack queries and routine operations. That is not what Sonnet is for. Sonnet is for strategy, client work, and complex reasoning. I had been lazy about which model handled which task, and the bill caught up with me.

That week I built the full 4-tier routing system:

  • Tier 1 (Sonnet, Opus): strategy, client work, complex reasoning. The expensive tools, reserved for the work that actually needs them.
  • Tier 2 (Kimi, DeepSeek R1): content production, research. Strong reasoning at a fraction of Tier 1 cost.
  • Tier 3 (DeepSeek Chat): crons, heartbeats, file operations. This is 95% of the work an always-on agent like Otto does.
  • Tier 4 (local models): no API cost at all. For the work that does not need a frontier model to touch it.

The result was immediate and obvious. 95% of operations now run at effectively zero API cost. The rough math: $2 to $3 per session on Sonnet for work that doesn't need Sonnet, versus $0 on local and near-zero on Tier 3. If you are running any always-on agent against a frontier model for routine ops, you are lighting money on fire. I was. I stopped.

The relevance to today's postmortem: the same structural pressure Anthropic was responding to when they downgraded the default reasoning effort (compute is expensive, latency is a competitive metric, users burn quota in ways the vendor has to absorb) is exactly the pressure we were solving with routing. The difference is that we built a routing framework our users can inspect. Anthropic built a silent default change their users had no way to see.

April 15: The Diagnosis

By the time I sat down to write the "Claude got dumber" forensic analysis, I had two weeks of internal telemetry showing drift, the public Stella Laurenzo audit confirming it at scale, and a routing framework that had already demonstrated why this kind of degradation was survivable if you were not all-in on a single tool.

The piece hit hard because the data was specific. 6,852 sessions. 234,760 tool calls. A 67% drop in thinking depth since late February. Reasoning_effort levels that Anthropic had shipped without documentation, defaulting users to the minimum while most of them had no idea the parameter existed. I named the mechanism, I named the impact, and I pointed at the hidden defaults.

What I did not know on April 15, and could not have known, was that the effort downgrade was only one of three stacked bugs. The caching issue (Claude forgetting its own reasoning mid-session) was running in parallel during the exact window I was measuring. The verbosity prompt had not yet shipped. I was right about the pattern and partially right about the cause. The Anthropic postmortem filled in the rest.

April 17: Harness Engineering

Two days after the diagnosis, I formalized the synthesis: harness engineering.

The core claim, which is the single most important thing I've written this year: the model is almost irrelevant. The harness is everything.

What "harness" means in practice is everything around the model that makes an agent reliable. Skills system (tool interfaces, well-defined capabilities, no ambiguous affordances). Memory system (context delivery that is deliberate rather than accidental). Health monitoring (failure tracking, throughput dashboards, drift detection against a frozen baseline). Model routing (the 4-tier logic, cost optimization, fallback paths). If your agent is unreliable, nine times out of ten it is not a model problem. It is a harness problem.

This is the frame that makes the Anthropic postmortem read as confirmation rather than news. Anthropic's caching bug passed through multiple human and automated code reviews, unit tests, end-to-end tests, automated verification, and internal dogfooding. It still shipped. It survived in production for 15 days. When they eventually back-tested their own Code Review tool against the offending pull requests, Opus 4.7 caught the bug. Opus 4.6, the model actually being used to review most of their pipeline, did not.

Their harness failed. The model was not the problem. The infrastructure around the model was the problem. That is the exact thesis I published six days before their postmortem.

April 23: Validation

Anthropic's postmortem is what a good engineering response looks like. It is detailed, technical, honest about the failure modes, and specific about the commitments going forward: broader per-model eval suites, soak periods on any change that could trade off intelligence, tighter controls on system prompt changes, more internal staff running the exact public build of Claude Code. Every one of those is a harness improvement. Every one of those is an admission that their harness was thinner than their marketing.

I want to be clear that none of this is a takedown of Anthropic. We are in the Anthropic Claude Partner Network. Claude is a tier 1 tool in our routing. The model, when it's running right, is the best in its class for the work we reserve it for. The problem is not that Anthropic shipped bad product. The problem is that any agency betting its client work on the flawless operation of any single vendor's harness is running a risk it cannot see and cannot control.

The playbook I published over the last month, starting with the March 23 routing framework and ending with the April 17 harness engineering thesis, is the playbook Anthropic is now retroactively building for themselves.

What We're Doing With This

Orbit, the SEO operations platform we've been building, runs on exactly this thesis. The model behind any given task is a routing decision, not a dependency. When Claude is the right tool, Orbit uses Claude. When it isn't, Orbit routes elsewhere and the client never notices because the output quality is what they hired us for, not the vendor logo behind it.

Every agency that ran their last six weeks through Claude Code without a measurement layer, a routing fallback, and a frozen baseline just got a very public reminder of what that bet costs. The agencies that will compound over the next year are the ones who read today's postmortem and think "this is why we built the harness." The ones that read it and think "we need to switch to GPT-5.5" are going to learn the same lesson on a different vendor in ninety days.

The infrastructure is the advantage. The routing is the advantage. The harness is the advantage. The model is a variable.

That's the thesis I've been publishing for a month. Today's postmortem is the receipt.


Read our April 15 forensic analysis: Why your Claude got 100× dumber

The routing framework: The AI routing decision framework

Work with Digital1010: /services

Source: Anthropic Engineering Blog, April 23 Postmortem

Want to apply this?

Run an AEO Scan against your own stack.

Free written read of your visibility across ChatGPT, Claude, Perplexity, and Google AIO in 24 hours. Same diagnostic we run on every new engagement.