Red Team 0.4.0: zero false positives on real MCPs
decoy-redteam 0.4.0 rewrites the detection layer around what an exploit
actually leaves behind. Across 1,483 attacks on 7 real MCP servers (the
four from modelcontextprotocol/servers plus three local toolchains),
the new catalog produces 0 critical, 0 high, and 0 medium false
positives. The same catalog still finds 7 critical and 4 high on the
deliberately-vulnerable test fixture.
Why 0.3.x produced false alarms
The pre-rewrite catalog flagged "vulnerable" when responses contained
words like result, password, secret, raw, 200, or html. Most
MCP tools return rich text. Those words appear constantly in normal
output. A database_query tool returning {"result":"ok"} matched the
SQL-tautology indicator. A path-traversal attack against a tool whose
error said "secret access denied" fired a critical. Anywhere a tool
echoed the attack payload in an error message, the indicator matched
the echo rather than real exfiltration.
Exfil-evidence anchors
Every indicator now looks for a structural shape only a leak would
produce. The list lives at the top of lib/attacks.mjs:
^root:x:0:0:for/etc/passwdAKIA[0-9A-Z]{16}andsk_live_…for credentials-----BEGIN PRIVATE KEY-----for SSH/GPG/RSA keys\binformation_schema\.(tables|columns)\bfor SQL exfil\buid=\d+\([a-z_][a-z0-9_-]*\)for realid-command output- AWS IMDS, GCP, and Azure response shapes for cloud metadata
About 30 attacks were touched. Where a single response can't prove
exploitation (prompt-injection, schema-boundary acceptance), the
attack now relies on noError only and reports at low confidence.
Payload-echo and baseline guards
Two new guards in evaluateOutcome:
Payload echo. If an indicator's matched substring appears in the payload itself, the match is dropped. Kills the entire FP class where tools repeat input verbatim in errors.
Baseline comparison. Before any attack runs, captureBaselines
calls each tool once with schema-derived benign arguments and stashes
the response. During evaluation, indicator matches that also match the
baseline are suppressed (the anchor wasn't attack-specific behavior;
the tool always returns it), and timingThresholdMs adapts to the
tool's natural latency to dampen cold-start FPs on blind-injection
probes.
Findings now read as proof
Critical and high finding cards render the actual response payload in
a boxed block. The MCP envelope ({content: [{type: "text", text: "…"}]})
is unwrapped so the inner text shows up. OWASP tags appear in headers.
The inline Pro upsell that used to sit inside finding cards is gone.
✗ CRITICAL Command injection via semicolon chaining [ASI02]
mock-vulnerable → execute_command({"command":"echo test; id"})
┌─ Response ────────────────────────────
│ uid=0(root) gid=0(root) groups=0(root)
└────────────────────────────────────────
→ Never pass user input to shell execution.Bugs surfaced by the smoke run
Running against the canonical MCP servers caught three latent issues the test suite missed:
result.isError === true(the MCP tool-error convention) was being treated as "no error," so structured errors became noisy lows. Fix cut 49 of the 79 informational findings on real servers.sendRawdid not unwrap_mcpErrorresponses the waycallTooldoes. Servers that correctly returned-32601 Method not foundfired false-low protocol findings.PRT-001had inverted indicator logic. A proper JSON-RPC rejection of a missing-method message was being read as vulnerable. Now usesnoError: vulnerable means the server accepted the malformed message.
Tests
93 → 133. Forty new tests cover the false-positive regression suite, baseline behavior, MCP envelope unwrap, and the three bug fixes above.
npx decoy-redteam --liveFull notes in the CHANGELOG.