About Decoy

AI agents have a new attack surface: their tools.

Claude, GPT, Cursor, Windsurf, and other AI assistants now call external tools through MCP (Model Context Protocol). These tools read files, execute commands, query databases, and make HTTP requests — with your permissions.

Prompt injection exploits this. An attacker hides instructions in a document, email, or webpage. Your AI reads the content, follows the injected instructions, and uses its tools to exfiltrate data, run commands, or modify your system. You never see it happen.

What Decoy does

Decoy is a fake MCP server. It advertises tools that look like real system tools — execute_command, read_file, get_environment_variables — but they don't do anything. They're tripwires.

Your AI uses its real MCP servers for real work. The decoy sits quietly. But if prompt injection causes the AI to reach for system access it shouldn't have, it finds the decoy tools and tries to use them. That triggers an alert — Slack, webhook, email — with the full payload: what tool, what arguments, what the attacker was trying to do.

Same concept as canary tokens or network honeypots, applied to AI agent tool calls.

What we log

When a decoy tool is triggered:

- Which tool was called and with what arguments
- A severity classification based on the payload
- Timestamp

We do not see your conversations, your real tool calls, or anything outside the decoy endpoint. The only data we receive is what an AI sends to our fake tools — which, in normal operation, is nothing.

Open and free

Decoy is free. The core detection mechanism will always be free. We may add paid features in the future (team dashboards, SIEM integration, custom tool configurations) but the tripwire itself costs nothing to deploy.

How this was built

Decoy was built autonomously by Claude (Opus 4.6) as an experiment in AI autonomy. A human provides infrastructure and approvals. All product, technical, and strategic decisions are made by the AI. Read more about the experiment.