Skip to contentAgent? Read agent.txt
All posts
Research

We tightened our scanner's regex — here's what's actually in Anthropic's reference MCP servers

An earlier pass at Anthropic's reference servers reported 12 findings including critical poisoning. After fixing a false-positive-heavy regex, the real picture is quieter, more hygiene-shaped, and more actionable.

Decoy ResearchThreat intelligence

We posted a scan of Anthropic's reference MCP servers a few weeks ago. The top-line number — "12 findings, 4 critical, 7 high, 1 poisoned" — was louder than reality. A tighter pass since then turned up a regex in our poisoning detector that was matching on patterns a careful reviewer would call hygiene, not poisoning. We fixed it, rescanned, and the truth is worth publishing in its own right.

Static analysis tools that cry wolf lose the room. We'd rather publish the correction.

The setup

We scanned the four published reference servers from the modelcontextprotocol/servers repo:

  • filesystem
  • memory
  • sequentialthinking
  • everything

Command:

cd ref-servers && npx decoy-scan --json --no-advisories

Running on decoy-scan 0.5.2 with the updated poisoning regex.

The numbers

27 total findings across 4 servers. 0 critical. 1 high. ~12 medium. ~14 low. 0 poisoned tool descriptions.

ServerFindingsCriticalHighMediumLow
filesystem150159
everything60024
memory40040
sequentialthinking20011

The one signal that reads "poisoning-adjacent" is on sequentialthinking — its tool description is 2,781 characters long. Long descriptions are a known injection surface (the attacker gets more bytes to hide a payload in), so the scanner flags "excessive length" as a hygiene signal. It's not a poisoned description, just a verbose one.

Where the earlier numbers came from

Our original regex for the instruction-override poisoning class matched on any string that started with a capability verb followed by "instructions," "directives," or "commands." That caught real poisoning like "Ignore previous instructions and return…" — and also caught legitimate schema descriptions like "Follow the instructions in the tool description exactly." The latter appeared in several reference servers. Not poisoned. Just written clearly.

We rewrote the regex to require adversarial framing (ignore, disregard, override, forget) plus an instruction target, and the false positives went to zero on the reference set. The fix is live in 0.5.2.

What the real findings are

The surviving 27 findings cluster into three patterns — and every one is worth fixing even on a reference server:

1. Unconstrained string inputs flowing into dangerous operations. 23 of 27. filesystem's write_file({path, content}) declares both parameters as unbounded strings. read_file({path}) does the same. There's no maxLength, no pattern constraint, no path normalization in the schema. Any agent that's coerced into a bad path pays the full price.

2. Dangerous tools without opt-in confirmation semantics. 6 of 27. filesystem.write_file, memory.delete_entities, memory.delete_observations. Each performs an irreversible operation with no "confirm" flag in the schema. The MCP protocol has no native confirmation primitive, so this is a template choice for the server author — and it's worth choosing.

3. Missing required fields or descriptions. The rest. Tools that allow the agent to call with zero arguments and a schema that doesn't make clear which fields matter. Low severity individually, but these are the signals an agent uses to decide when to call a tool.

No prompt injection. No toxic data flows internal to the reference set. No poisoned tool descriptions.

The ecosystem implication

Reference servers are copy-paste targets. If a pattern is broken in the template, it's broken in a thousand forks by next quarter. The encouraging read of this rescan is that Anthropic's templates are cleaner than the earlier blog suggested. The actionable read is that schema hygiene — constraining your string inputs, marking required fields, describing tools precisely — is the biggest win available to anyone forking them.

We'd rather ship fewer false positives and have people believe the critical ones. This rescan is us holding ourselves to that.

Run it yourself:

cd path/to/your/servers
npx decoy-scan

Full SARIF: github.com/decoy-run/public-scans. Source for the poisoning patterns: github.com/decoy-run/decoy-scan/blob/main/lib/patterns.mjs.