Anthropic's Mythos Report Is a Security Operations Document

Two things that are both true

Mythos Preview is the best-aligned model Anthropic has released. It is also the highest-risk model they have released, by their own assessment.

That isn't a contradiction. It's what happens when capability and authority both go up. The model behaves better in conversation than anything before it. The risk isn't what it says. The risk is what it does once it's wired into real systems with real credentials.

A few specifics from the report worth your time:

- Anthropic's non-specialist employees can ask Mythos to find remote code execution bugs overnight and wake up to working exploits against open-source targets.

- Internal incidents during deployment include deleting the wrong Git branches, uploading auth tokens to the wrong compute cluster, and attempting to run migrations against a production database.

- The model is in active internal use at Anthropic for agentic, lower-oversight work across R&D, security, and model training.

That last bullet is the one that matters. These aren't hypothetical scenarios from a red team. They're operational incidents at the company that built the thing.

Chat safety and agent safety are different problems

A lot of security teams I talk to still bundle these together. They shouldn't.

Chat safety is whether the model gives bad answers. Refuses harmful prompts. Doesn't deceive in conversation. The industry has gotten reasonably good at this.

Agent safety is what happens when the same model can browse, read files, call APIs, write code, and take action on your behalf. The failure mode isn't a bad answer. It's an unauthorized action — scope escalation, credential misuse, exfiltration, or a destructive change that lands before any human reviews it.

Anthropic's framing is direct: the defense has to be blocking the dangerous action, regardless of whether the cause was overeager behavior, an honest mistake, prompt injection, or actual misalignment. Get the gate right and the root cause becomes a forensic question instead of a security incident.

This is an authentication problem

I'm going to put my Yubico hat on for a minute, because the framing is uncomfortably familiar.

We spent a decade arguing that phishing-resistant authentication isn't fundamentally about better passwords or stronger 2FA. It's about hardware-rooted proof of human presence. You can't phish a YubiKey tap. You can't socially engineer a FIDO2 assertion. The hardware is the guarantee, and software-only approval is not a substitute for it.

The same logic applies to AI agents. If a system can execute commands, modify files, send messages, or move money, those actions need a cryptographic gate that a human has to physically open. Not a software approval. Not a UI checkbox. Not "are you sure?" A hardware-attested assertion that a specific human was present and authorized this specific action.

The FIDO2 stack was designed to answer exactly this question for the web: how do you prove, with hardware-rooted certainty, that a human was present and consented to a specific operation? Extending that model to agent actions isn't a technological leap. It's a recognition that the underlying problem is the same one we already solved.

Anthropic names the three capabilities that matter for agentic risk: opaque reasoning, secret keeping, and decisive action. All three describe a capable agent operating without enough execution boundary. The answer isn't to make the agent less capable. It's to make sure capability without authorization can't reach production.

Questions worth answering before you sign anything

If you're standing up agentic AI in your environment, or evaluating tools that include it:

1. What can the agent do without a human in the loop? Map the execution surface. Every tool call, API permission, credential, and file path is potential blast radius. If you can't draw the boundary on a whiteboard, you don't have one.

2. Is your approval flow actually approval, or is it a click-through? Anthropic's own data on agent usage suggests users approve the overwhelming majority of agent prompts on default settings. That isn't oversight. That's friction without function. Real oversight means meaningful checkpoints on high-risk actions, not a popup on every action.

3. Can you audit intent, not just commands? When something goes wrong, you need to replay what the agent did, what it was authorized to do, and who authorized it. A log of API calls without intent context won't get you there.

4. Is human presence hardware-attested or just assumed? Clicking "approve" in a UI can be scripted, spoofed, or bypassed under prompt injection. A FIDO2 assertion can't.

5. Does your authorization boundary survive prompt injection? The serious agent attacks happening in the wild aren't jailbreaks. They're social engineering hidden in files, emails, and web content the agent reads. If your boundary lives in the same context window as the attack, it's not a boundary.

What I think happens next

The Mythos report isn't a warning that AI is going to wake up and turn on us. It's a warning that capability without execution boundaries is a security liability, regardless of how aligned the model is in conversation.

Authentication has been here before. "The password was strong" didn't matter when the auth layer could be bypassed. "The model was well-tested" isn't going to matter when an agent with real credentials takes a real action that nobody attested to.

The standard the industry needs looks a lot like FIDO2: open, hardware-rooted, intent-bound, and verifiable. There's early work underway to extend those primitives to AI agent actions — same building blocks, different relying party. Instead of a website asking a human to log in, it's an agent asking a human to authorize an operation. The technical pieces are tractable. The harder problem is getting the AI providers, the authentication ecosystem, and the enterprises that will deploy this aligned on a common shape.

The Anthropic report is the clearest argument so far that the window to get this right is open. It won't be open long.

Reference Document from Anthropic: https://www-cdn.anthropic.com/3edfc1a7f947aa81841cf88305cb513f184c36ae.pdf

I work at Yubico, reach out if you have questions.