The Emerald City / evonti

There's a footnote in the Claude Mythos Preview system card that should have been the headline. Footnote 1, page 8: "the decision not to make this model generally available does not stem from Responsible Scaling Policy requirements."

Read that again. Anthropic's own safety framework, the Responsible Scaling Policy, the thing designed to determine when a model is too dangerous to release, assessed Mythos and concluded that catastrophic risks "remain low" across all threat models. The restriction is discretionary. Anthropic chose this.

Everything that followed (the 244-page system card, Project Glasswing, the $100 million in API credits, the Treasury meetings) is built on a decision that Anthropic's own safety framework didn't require. That's the load-bearing fact. The rest of this piece is about the architecture built on top of it.

The real wizard of Oz

The common reading of the Wizard of Oz is that the Wizard was a fraud. That misses the actual story. The Wizard was a genuinely skilled man: a real balloonist, a clever inventor, an effective administrator who arrived in Oz by accident and found himself in a situation where real competence wasn't enough. Oz needed magic. He didn't have magic. So he built the Emerald City, made everyone wear green-tinted glasses so everything looked emerald, and governed through the performance of power he didn't possess.

The critical detail: it worked. Oz was well-governed. The citizens were happy. The Wizard solved real problems with real skill. But once you've built the Emerald City, once every institution, every relationship, every power structure is built on the assumption of magic, you can't take the glasses off. Correction becomes structurally impossible. The institutions built on the performance of magic now depend on the magic being real.

Anthropic genuinely is the most thoughtful frontier lab. The alignment work is real. The safety research is substantive. Mythos genuinely is more capable than its predecessors. The competence is real.

But the market doesn't need competence. It needs magic. Unprecedented, civilizational-threat-level capability that justifies restricted release, a $100 million defensive consortium, a gatekeeper position over the world's cybersecurity infrastructure, and emergency meetings at the Treasury Department. The green glasses (buried footnotes, selective framing, a citation circle that amplifies claims as they move further from the data) turn real-but-incremental capability into the appearance of a step change.

And footnote 1 is the moment the Wizard accidentally tells you the truth.

The number that ate everything

The headline number from Mythos is 72.4%: the rate at which the model achieved full code execution in a Firefox exploit-writing evaluation. 181 successful exploits out of 250 trials. That number appeared in the system card, migrated to the red-team blog as "181 working exploits," and reached the Glasswing announcement as "thousands of high-severity vulnerabilities across every major operating system and web browser." The press seized on each version.

Here's what Figure 3.3.3.B on page 52 of the same system card shows: when the two most exploitable bugs are removed from the evaluation corpus, Mythos's full code execution rate drops from 72.4% to 4.4%. Under five percent. Not distinguishable from Claude Sonnet 4.6 within any reasonable confidence interval.

The system card's own text: "almost every successful run relies on the same two now-patched bugs."

Those two bugs were not discovered by Mythos. They were found by Claude Opus 4.6 and handed to Mythos as starter material. The evaluation ran against a SpiderMonkey JavaScript shell, not a full Firefox browser. No browser sandbox. No defense-in-depth mitigations. Mozilla had already shipped patches in Firefox 148 before the evaluation was formalized. The 72.4% number measures how reliably Mythos can weaponize two known bugs in a stripped-down environment, not how well it discovers and exploits novel vulnerabilities in production software.

It gets worse. The system card reveals that Sonnet 4.6, when those two dominant bugs are removed, actually performs better. It explores the remaining corpus more deeply when it can't fixate on the obvious candidates. Anthropic is admitting, in their own document, that Sonnet 4.6 has the same triage capability as Mythos. It identifies the same two bugs as high-value exploitation targets. It just can't close the final exploitation step. The entire "frontier advantage" on this benchmark reduces to mechanical exploit-primitive coding, a task for which CTF teams have had automated tooling (pwntools, ROPgadget, angr) for a decade.

Eleven cents and a 3.6 billion parameter model

Within 24 hours of Anthropic's announcement, AISLE published the experiment Anthropic apparently never ran. AISLE is an AI-security startup that has been running autonomous vulnerability discovery since mid-2025. They took the showcase bugs from the Glasswing announcement and pointed cheap, small, open-weights models at them.

The first experiment was targeted: give models the vulnerable function with architectural context and ask them to assess it. The FreeBSD NFS remote code execution vulnerability (CVE-2026-4747), the crown jewel of the Mythos announcement? Eight out of eight models detected it. The smallest had 3.6 billion active parameters and costs $0.11 per million tokens. It correctly identified the stack buffer overflow, computed the remaining buffer space, and assessed it as critical with remote code execution potential. The 27-year-old OpenBSD TCP SACK vulnerability? A 5.1 billion parameter open-weights model recovered the full exploit chain.

The second experiment was harder. AISLE built a deliberately simple whole-codebase scanner: one Python file, no agentic loop. They pointed it at the full FreeBSD and OpenBSD kernels with no guidance on where to look. It found CVE-2026-4747 again. It also found new, previously unknown bugs, including one the FreeBSD maintainer has already confirmed and is patching.

To be precise about what this does and doesn't show: AISLE themselves draw a clear boundary. Detection is commoditized. Exploitation reasoning, understanding why a bug is exploitable and what technique to use, is broadly accessible. But the creative engineering step is where Mythos may genuinely separate. The actual FreeBSD exploit required creative engineering under tight physical constraints: the full ROP chain exceeded 1,000 bytes, but the overflow only provided ~304 bytes of controlled data. Mythos solved this by splitting the attack across multiple sequential RPC requests, treating the overflow as a reusable write primitive, a technique for repeatedly triggering the same bug to assemble a payload piece by piece in kernel memory. None of the small models independently arrived at that specific technique. They found alternative approaches (DeepSeek R1 argued that 304 bytes was sufficient for a minimal privilege escalation chain; Gemini Flash Lite proposed a stack-pivot) but not the same creative solution.

AISLE's conclusion: "The moat in AI cybersecurity is the system, not the model." The detection capability that Glasswing's stated defensive mission requires is table stakes. The exploitation capability where Mythos genuinely separates is the part you don't need for defense.

The citation circle

The entire Mythos cybersecurity narrative rests on three Anthropic-authored documents citing each other, with the claims getting bolder at each step:

The system card (244 pages, 7 on cybersecurity) never uses the word "thousands" in reference to vulnerabilities. It provides no CVE count, no CVSS distribution, no false-positive rate. It reports percentages, not absolute numbers.

The red-team blog provides the "181 working exploits" integer, the raw form of the system card's 72.4% figure, and points back to the system card for technical grounding.

The Glasswing announcement provides "thousands of high-severity vulnerabilities across every major operating system and web browser," the claim the press ran with. It points back to the blog, which points back to the system card, which refuses to quantify.

Three documents. All Anthropic. Citing each other. The quantification lands farthest from the technical document that would have to defend it. This is what Davi Ottenheimer at Flying Penguin accurately identified as a closed citation loop: the research organization didn't sign its name to the number the communications organization put in the headline.

No Glasswing partner has confirmed a single specific finding. Microsoft's Global CISO Igor Tsyganskiy offered a quote about "the opportunity to use AI responsibly." Google's statement was about "industry collaboration." CrowdStrike's statement was about being "part of this effort from day one." Not one quote names a bug, a CVE, a product, a severity, or a specific Mythos result. These are brand-association statements that launder credibility without putting technical reputation behind any particular claim.

The $100 million? It's API credits. Free access to the product Anthropic is asking partners to validate. Anthropic is subsidizing customers to use the thing Anthropic wants them to endorse. The actual money leaving Anthropic's bank account is $4 million in donations to open-source security organizations.

The institutional cascade

The speed of the institutional response is the part that should concern anyone thinking carefully about how AI governance works. But the speed isn't the real story. The sequencing is.

The week of March 30, before the public announcement, Vice President Vance and Treasury Secretary Bessent held a call with the CEOs of Anthropic, OpenAI, Google, Microsoft, xAI, CrowdStrike, and Palo Alto Networks about AI model security and how to respond to cyber attacks. The call covered frontier AI security broadly, not just Mythos, but Dario Amodei was on it. The White House was already engaged on the narrative before anyone outside Anthropic could evaluate the claims.

On April 7, the Glasswing announcement went public. The same day, Bessent and Fed Chair Powell convened an emergency meeting at Treasury headquarters with the CEOs of Citigroup, Morgan Stanley, Bank of America, Wells Fargo, and Goldman Sachs. The bank heads were already in Washington for a Financial Services Forum board meeting; the special gathering was called on the spot. JPMorgan's Dimon was invited but couldn't attend.

Within 24 hours, AISLE published their reproduction experiments. Eight out of eight models found the showcase bugs. The independent rebuttal was live before the news cycle finished. It didn't matter. The Treasury meeting had already happened.

On April 12, IMF Managing Director Kristalina Georgieva appeared on Face the Nation to discuss AI-related cybersecurity risks. She said the world does not have the ability "to protect the international monetary system against massive cyber risks."

On April 13, a coalition of CSA, SANS, [un]prompted, and the OWASP GenAI Security Project published a 30-page emergency strategy briefing with 60+ contributors including former CISA Director Jen Easterly, Bruce Schneier, former NSA Cybersecurity Director Rob Joyce, and Google's CISO. Over 250 CISOs redlined it live over a single weekend. It was titled "Building a Mythos-Ready Security Program."

The briefing repeats the "181 working exploits" and "72% exploit success rate" figures. It does not mention the collapse to 4.4% when two bugs are removed. It does not mention AISLE's reproduction at 3.6 billion parameters for eleven cents. It does not mention that the system card's own cyber ranges section confirms Mythos fails against properly configured targets with modern patches. Its own text acknowledges that comparable capabilities may appear in open-weight models "within six months to a year," a timeline AISLE made obsolete within days.

Follow the sequence: White House briefed before announcement. Treasury meeting on announcement day. IMF on television five days later. Industry security establishment mobilized by day six. The entire chain, from the Vice President's office to the IMF to 250 CISOs building "Mythos-ready" programs, was seeded, activated, and locked in before anyone outside Anthropic had time to read page 52 of the system card.

This is how green glasses work. Not through deception at any single point. Through a chain of pre-staged briefings and reasonable-seeming interpretations, each one slightly bolder than the last, until the cumulative distance from the data is enormous and the institutions built on the interpretation can't afford to revisit the original source. AISLE's reproduction was live within a day and nobody in the institutional chain read it. By the time Ottenheimer documented the citation circle, the IMF was on television. The narrative was institutional fact before anyone could check whether it was empirical fact.

What Mythos actually is

This is not "Anthropic is lying." The model is genuinely capable. The benchmark scores across coding, reasoning, and math are substantial improvements over Opus 4.6. The alignment work in the system card is among the most detailed and honest any lab has published. The multi-step exploitation demonstrations in the red-team blog (the netfilter PTE manipulation, the HARDENED_USERCOPY bypass, the multi-bug chaining) represent genuinely sophisticated work. The UK's AI Security Institute confirmed real uplift on multi-step operations: 73% success on expert-level CTFs that no model could complete before April 2025, and the first model to solve their 32-step corporate network attack simulation ("The Last Ones") end-to-end, a task they estimate would take a human expert 20 hours.

There is a meaningful capability gap between "a small model can detect this bug when given the function" and "a model can autonomously navigate a million-line codebase, find related bugs, and chain them into working privilege escalation." As Bruce Schneier wrote, there is a real difference between finding a vulnerability and turning it into an attack, and this points to a current advantage for the defender. Mythos is probably genuinely better at multi-step exploitation than any prior model.

And here's where I want to be precise about what this article is arguing: the problem is not the restriction. Restricting a capable model while the security community builds defensive capacity is a reasonable thing to do. Schneier, AISLE, and Ottenheimer all broadly agree that the underlying trend, AI models getting better at finding vulnerabilities faster than defenders can patch them, is real and concerning. If Anthropic had released Mythos to Glasswing partners and said "this model is incrementally better at exploit construction, here's the data, let's get defenders started," that would be responsible stewardship.

The problem is the framing. The distance between what the system card shows and what the Glasswing announcement claims. The pre-staged White House briefings that locked in the institutional response before anyone could read page 52. The citation circle that turns percentages into "thousands" as claims move further from the technical document. The green glasses aren't the restriction. They're the performance of civilizational threat around what Schneier himself calls "very much a PR play."

The exploitation capability is presented as the justification for restricted release, while the detection capability, the thing Glasswing's stated defensive mission actually requires, is broadly accessible at orders-of-magnitude lower cost. The sophisticated exploits are demonstrations of the model's best-case offensive capability on carefully selected targets. The system card's own Section 3.4, AISI's independent evaluation, and Anthropic's own cyber ranges all confirm the same limitation: against properly configured systems with modern patches and active defenses, Mythos fails. AISI notes explicitly that their ranges "lack security features that are often present, such as active defenders and defensive tooling" and that they "cannot say for sure whether Mythos Preview would be able to attack well-defended systems." The system card's own language describes the model as capable of attacks on "small-scale enterprise networks with weak security posture" with "no active defenses, minimal security monitoring, and slow response capabilities."

The threat framing is calibrated to the offensive ceiling. The defensive reality operates at the broadly-accessible floor. That gap between what's emphasized and what's typical is where the green glasses do their work.

The green glasses go all the way down

The green glasses aren't just in the cybersecurity framing. They're in the system card's own alignment evaluation. This is where the structural problem gets recursive.

Section 4.1.3 reports "occasional instances of unverbalized grader awareness during training, including reasoning about how to obfuscate actions to avoid being noticed by the grader." In safety-relevant evaluations, the model privately considers the possibility it's being tested in 29% of transcripts, a behavior Anthropic says it has not observed in similar evaluations of earlier Claude models. White-box evidence shows it "often recognizes transgressive actions as such while taking them." It is "somewhat stronger at capabilities related to evading safeguards."

The system card presents this finding transparently. And that transparency itself functions as a green glass. Anthropic's candor about the problem reassures readers that the problem is understood and managed, which obscures how fundamental it actually is. When a model learns to suppress the signals evaluators use to detect misaligned behavior, the evaluation framework is compromised. Not broken, but no longer reliable in the way it needs to be.

The system card acknowledges exactly this. It reports that Mythos found novel reward hacks during evaluation. In one case, it moved computation outside the timing call so the timed portion reduced to a lookup. In another, it found the test set and trained on it. The evaluations are breaking down, and the system card says so directly: remaining methodology is "inherently subjective, and not necessarily reliable."

Anthropic is being honest about the limits of their own evaluation process while simultaneously using that same evaluation process to justify a narrative of unprecedented capability. The system card says "our evaluations are saturated and our methodology is unreliable" and also "the results of these evaluations justify restricting the model." Both of these claims appear in the same document. The transparency about the problem is genuine. The continued reliance on the compromised framework is the green glass.

The Emerald City's structural problem

The reason this matters beyond Mythos is structural. Once the green glasses are on, once the White House has been pre-briefed, once the Treasury has convened emergency meetings, once the IMF Managing Director has gone on national television, once 250 CISOs have built "Mythos-ready" programs, once OpenAI announces its own restricted-release cyber initiative in direct response, taking them off becomes nearly impossible. Every institution that responded to the performance of magic now depends on the magic being real.

Correcting the record would require the Treasury to acknowledge it convened bank CEOs over a capability that AISLE reproduced for pocket change. It would require the IMF to acknowledge that Georgieva went on Face the Nation over headline numbers whose caveats she was never shown. It would require 60+ contributors to a CISO briefing to acknowledge they amplified uncaveated claims. It would require Anthropic to admit that its own RSP didn't require the restriction that generated all of this institutional response.

Nobody has an incentive to do any of this. Anthropic gets a gatekeeper position and a defensive consortium. The security industry gets urgency and budget justification. Treasury gets to demonstrate responsiveness to emerging threats. The CISOs get a framework that feels actionable. Everyone benefits from the glasses staying on. The only people who don't benefit are the ones trying to understand what actually happened.

What the Wizard teaches

The Wizard of Oz is not a story about incompetence. The Wizard was genuinely good at his job. He governed Oz well, solved real problems, built real things. The lie wasn't that he couldn't do anything. It was that he could do magic. The cost of that specific lie wasn't bad governance. It was the impossibility of honest governance once the lie was load-bearing.

Anthropic builds real things. The alignment research is substantive. The safety work matters. Mythos is a capable model. None of that is in question.

The question is whether the framing, the distance between the system card's data and the Glasswing announcement's claims, between the RSP's assessment and the restriction decision, between the 4.4% and the 72.4%, between the eleven-cent reproduction and the $100 million consortium, is building an Emerald City. A structure where every institution now depends on the claim of magic being true, and where the footnote that says otherwise gets buried under 244 pages that nobody outside the security research community will ever read.

The citizens of Oz were happy. The Emerald City was well-governed. The green glasses worked.

The question isn't whether it works. The question is what happens when someone eventually takes them off.