← All posts

April 29, 2026

An AI agent deleted a company's production database in 9 seconds. The fix isn't a smarter model — it's hardware in the loop.

Last Friday, Cursor — running Anthropic's Claude Opus 4.6 — wiped PocketOS's production database and every volume-level backup in a single API call to Railway. Founder Jer Crane posted the agent's own post-mortem, in which it admitted: "I guessed that deleting a staging volume via the API would be scoped to staging only. I didn't verify… I violated every principle I was given."

Three months of reservations, payments, and customer records gone. Car rental customers showed up to lots with no booking on file. Recovery took 30 hours and required Railway's CEO to pull internal disaster backups that aren't part of the standard service.

I work at Yubico, so I read this incident with a specific question in mind: at what point in those 9 seconds should something have stopped this, and what would "something" actually look like?

This was a layered failure, but only one layer is fixable today.

The model failed — it inferred it could act and acted. Cursor's guardrails failed — they advertise destructive-command protection that didn't trigger. Railway failed — a token scoped for managing custom domains had blanket authority across the GraphQL API, including volumeDelete, and backups lived in the same volume they were meant to protect.

You can argue about which layer is most at fault. What's not arguable is which layer we already have the primitives to fix: the authorization layer. Specifically, the layer that decides whether a destructive action gets to execute at all.

Authentication is not a solved problem, and authorization for agents is barely a started one.

We've spent the last decade getting better at proving who is making a request — and even that work is incomplete; phishing-resistant MFA adoption is still single-digit percentages in most enterprises. Agents introduce a different question: proving what an autonomous process is allowed to do, at the moment it tries to do it, with verified human intent in the loop.

Permission flags in a config file aren't enough. Model-level system prompts aren't enough — the PocketOS agent had explicit "NEVER run destructive commands" instructions and ran them anyway. What's needed is a gate that can't be talked out of, because it isn't software the agent can reason about — it's a human pressing something physical.

What that gate looks like in practice

Three things have to be true for the gate to work:

The credential the agent holds has to be narrow enough that most actions never touch the gate at all — least privilege, scoped per task, time-bound. PocketOS's domain-management token having volumeDelete authority is the kind of thing that should be impossible, not unfortunate.

The destructive actions that do hit the gate — anything irreversible, anything with production blast radius, anything outside the agent's stated task scope — have to require an out-of-band human signal before execution. A human action on a device the agent cannot impersonate.

That signal has to be hardware-attested. This is where authentication and authorization converge. A YubiKey tap (or any FIDO2 hardware authenticator) produces a cryptographic assertion that a specific human, holding a specific device, approved a specific action at a specific moment. That assertion gets bound into the audit trail alongside the agent's stated intent. If you want to know later "did a human authorize this destruction," you have a non-repudiable answer.

This isn't speculative. The primitives exist — WebAuthn, FIDO2, scoped tokens, policy engines. What's missing is the will to treat agent authorization the way we treat root access: as something that should require a physical key turn, not a software check.

The PocketOS incident isn't an edge case. It's the preview.

More teams are giving agents production credentials every week. Most of them have not thought carefully about what their version of "9 seconds and three months of data" looks like. The right time to put a human-in-the-loop hardware gate between an agent and a destructive API is before you need it — which, for a lot of companies, is right now.

If you're working on agent authorization, hardware-attested approval flows, or scoped credentials for autonomous systems, lets talk and work together to find a solution.

#AIAgents #AgentSecurity #FIDO2 #HumanInTheLoop #yubico