← All posts

April 8, 2026

Giving AI Agents a Kill Switch (The Hardware Kind)

A few months ago, we started asking a simple question: what would it actually take to put a human meaningfully in the loop when an AI agent does something risky?

We didn't have a product roadmap. We didn't have a spec. We had a question, some time, and a YubiKey. What followed was one of the more interesting exploratory builds we've done — and we learned a lot along the way.

The Problem We Kept Coming Back To

AI agents are having a moment. They can write code, manage files, send messages, run database migrations — and they're getting better at all of it, fast. That's genuinely exciting. It's also a little terrifying if you've ever watched an agent confidently do exactly what you asked it to do, just not what you meant.

The current answer to this is usually software guardrails — allow lists, confirmation prompts, logging. These help. But they all share the same weakness: they can be misconfigured, bypassed, or quietly ignored when the system thinks it knows best.

We kept asking: what if the guardrail wasn't software at all?

What We Built: RDT

RDT — Role Delegation Tokens — is the system that came out of that question. The core idea is simple: before an AI agent can execute a risky action, it has to ask for permission. Not from another piece of software. From you — verified with a physical YubiKey tap.

Here's how it plays out in practice:

  1. You ask an AI agent to run a production database migration.

  2. The agent recognizes this as a high-risk action.

  3. It sends an approval request and pauses.

  4. You get a link. You click it, tap your YubiKey, and approve (or deny).

  5. The agent proceeds — and the whole thing is cryptographically signed and logged.

No tap, no action. It's that simple.

Why a YubiKey?

This was one of our early debates. Why not just a button click? Why not a second password?

Because software can lie. A confirmation dialog can be automated around. A checkbox can be pre-checked. A prompt can be dismissed by the agent itself if something is misconfigured.

A physical key tap cannot be faked by software. When you touch that gold circle, you made a decision. That's the point. We're not just logging that something happened — we're proving that a human was present and intentional about it. Every approved action produces a cryptographic signature tied to what was approved, written into an audit log you can review later.

That realization — that hardware presence is categorically different from software confirmation — was one of the biggest things this project taught us.

Not Everything Is High Risk

Another thing we learned early: a binary "approve everything or approve nothing" system is useless in practice. It either creates too much friction or gets bypassed entirely.

So we built a tiered policy system:

  • Low risk (reading files, basic lookups) → auto-approved, no tap needed

  • Medium risk (file writes, logged actions) → auto-approved but flagged for review

  • High risk (shell commands, database ops, outbound messaging, scheduling) → YubiKey tap required

  • Hard blocks (things like rm -rf /) → denied always, no exceptions

There's also a "gray zone" for ambiguous actions — the system scores the request based on things like how reversible it is and how broad the potential impact could be, then makes a judgment call. Those decisions go into a review queue for later human review. Getting this tiering right took a lot of iteration, and honestly, we're still refining it.

A Chrome Extension, Because UX Matters

Early on, approvals required copying a URL from a terminal. That was fine for testing, but obviously not how real people work. So we built a Chrome extension that brings the approval flow into the browser natively — a popup appears, you tap your key, done. Approvals are pushed in real time over a WebSocket connection, so there's no lag.

This was a good reminder that security UX is security. A flow that people find annoying will get bypassed. One that feels seamless will actually get used.

What We Proved (And What We Didn't)

We proved the concept end-to-end: an AI agent running in the cloud, with real tool access, gated by a piece of hardware sitting on a desk. The approval is cryptographically verifiable. The audit trail is tamper-evident. The agent genuinely cannot proceed without a human saying yes.

That's not nothing. That's a real safety primitive, and seeing it work was genuinely satisfying.

What we didn't solve: scale, multi-agent workflows, mobile approval flows, and about a dozen other things that a production-grade version of this would need. This was an exploratory project, not a finished product. We were intentional about that — the goal was to learn what's possible and surface the right questions, not to ship something complete.

We think we did that.

Where This Goes

We're sharing this work openly under Yubico because we think the underlying idea matters beyond our specific implementation. Hardware-rooted human oversight for AI agents is a problem the whole industry needs to solve, and we'd rather contribute to that conversation than sit on what we learned.

If you're building with AI agents and thinking seriously about how to keep humans meaningfully in the loop — not just nominally, not just for compliance, but actually — I'dlove to talk.