🎯 Launch offer: first 3 clients get 40% off in exchange for a public testimonial — email hello@mcpdone.com with your tier + project.
← All posts

I added a command-injection check to mcp-audit — 7 HIGH findings, all in one repo

· mcpsecuritycommand-injectionauditmcp-auditfastmcp

Four days ago I published a one-check audit of 15 popular Python MCP server repos and found 10 of them still pinned a vulnerable Starlette. Today I shipped mcp-audit v0.3 with two more checks and re-ran the same 15 repos.

Headline: the new command_injection check found 7 HIGH findings, every one of them in a single file: kubectl_mcp_tool/tools/cluster.py inside rohitg00/kubectl-mcp-server. Every other repo in the sample scored zero on this check, including ones I was watching for it (awslabs/mcp at 778 total findings, serena, mcp-alchemy).

The Starlette CVE numbers, by contrast, barely moved in the 4-day window — 68 → 66 across the same 15 repos, and the 2-finding drop is scanner refinement rather than maintainer action.

This post is the honest follow-up: what the new checks surface, what they don’t, and how to read the inflated total (1,957 findings sounds dramatic; 1,884 of those are LOW-severity hygiene flags).

What v0.3 adds

The repo is at github.com/Alienbushman/mcpdone-samples/tree/master/mcp-audit. v0.1 shipped one check; v0.3 ships four:

Check Severity What it flags
starlette_badhost HIGH / MEDIUM Starlette < 1.0.1 in pyproject.toml / requirements*.txt / lockfiles (BadHost CVE-2026-48710)
fastmcp_wrapper_layer HIGH Sync @mcp.tool() functions calling asyncio.run() inside their body (the bug that ships green tests)
tool_input_validation LOW @mcp.tool() parameters typed as bare str / bytes / Any / list[Any] / dict[..., Any] with no Field(...) / Length / Pattern constraints (added v0.2)
command_injection HIGH @mcp.tool() functions where a tool parameter (or a local tainted via assignment / .format() / string concat) flows into os.system, os.popen, or subprocess.* with shell=True or a tainted-interpolated command string (added v0.3)

command_injection is a single-function AST taint analyzer. It builds the set of tainted variable names from the function’s parameters, propagates taint through Assign / AnnAssign / AugAssign whose RHS contains a tainted expression (Name / JoinedStr / BinOp / format-call / collection), then checks each subprocess sink. The safe subprocess.run(["git", "checkout", branch]) list-of-args pattern is correctly not flagged. Cross-function flows (def handler(arg): _run_shell(arg) where _run_shell calls subprocess) are explicitly out of scope for v0.3.

The headline: 7 shell-exec sinks in kubectl-mcp-server

All seven command_injection HIGHs land in kubectl_mcp_tool/tools/cluster.py. That repo wraps kubectl and exposes the commands as MCP tools, so this is confirming-expected — a kubectl wrapper is exactly where you’d predict shell-exec patterns to live. The interesting datum is the count: 7, not 1 or 2.

What’s surprising is the opposite — the repos I was watching for the same pattern and didn’t find it:

Two honest reads of that distribution: (a) most popular MCP authors route through SDKs and don’t shell out, or (b) v0.3’s single-function taint analyzer misses cross-function flows. Probably both, and the kubectl-mcp-server findings are a lower bound — multi-hop user-input → helper → subprocess flows would not be caught.

If you maintain kubectl-mcp-server, the seven paths are in cluster.py; happy to file a PR.

The Starlette delta: barely moved in 4 days

v0.1 (2026-06-24) v0.3 (2026-06-28) Delta
Total starlette_badhost findings 68 66 −2
Repos with at least one finding 10 9 −1
awslabs/mcp count 58 58 0
modelcontextprotocol/python-sdk 3 1 −2
blazickjp/arxiv-mcp-server 2 2 0

No repo in the sample visibly bumped to a fixed Starlette pin in the window. The 2-finding drop on the python-sdk is mcp-audit scanner refinement (a tighter Annotated[..., Field(...)] recognizer in v0.2 narrowed two false-positive MEDIUMs in the python-sdk’s pyproject.toml), not a maintainer fix.

awslabs/mcp is essentially flat at 58 findings — vulnerable 0.50.0 / 1.0.0 pins still strewn across src/* and samples/* lockfiles. blazickjp/arxiv-mcp-server still ships uv.lock with starlette==0.52.1 plus an unbounded >=0.27.0 floor in pyproject.toml.

For honesty: we patched our own internal mcp-twitter (1.0.0 → 1.3.1) between v0.1 and v0.3, but that’s outside this 15-repo sample. Across the public sample, maintainer response on the CVE in the 4-day window was zero.

A four-day window is also too short to expect meaningful action — most maintainers will see the original post via search or a Dependabot bump rather than this audit, and the second pass is more useful as a “did anything move” signal than a “shame the laggards” one. The honest answer is: nothing visibly moved.

tool_input_validation: a hygiene snapshot, not a vuln list

1,884 LOW findings across the sample. The headline number is misleading on its own — ~88% comes from three repos: awslabs/mcp (720), kubectl-mcp-server (628), and jlowin/fastmcp’s examples/ directory (320). That’s more a function of repo size and tutorial-example density than of relative hygiene.

The pattern flagged is @mcp.tool() parameters annotated as bare str / bytes / Any / list[Any] / dict[..., Any] with no Field(min_length=...), Length, Pattern, or Literal[...] constraint. Not exploitable on their own — they’re the missing guardrail that makes downstream sinks (SQL, shell, file IO) easier to abuse, and the substrate prompt-injection-via-tool-description attacks rely on.

Worth flagging upstream: modelcontextprotocol/python-sdk (128) and jlowin/fastmcp (320) together account for 448 findings, almost all in docs_src tutorial examples. Tutorial code that gets copy-pasted into production servers is a real attack-surface multiplier. A SDK that ships def search(query: str) examples without Annotated[str, Field(max_length=256)] is shaping the downstream ecosystem.

The honest framing for this check: it’s a code-smell scanner that gets useful when you correlate it with the HIGH checks, not a standalone severity signal. “1,884 findings” is the technically-correct number; “no popular MCP server passes this check cleanly” is the more honest sentence.

What v0.3 still doesn’t cover

The same caveats from the v0.1 post still apply, plus three new ones from the v0.3 checks:

  1. stdio-vs-HTTP: the starlette_badhost check only matters for repos that actually serve HTTP/SSE. A stdio-only MCP with a vulnerable Starlette pin in uv.lock is a hygiene issue, not a live vuln.
  2. v0.3 = 4 checks total. A “clean” scan means “no hit on these four,” not “secure.” Auth, secret handling, prompt-injection-via-tool-description, transitive deps, and TOCTOU on file tools are all still unchecked.
  3. Single snapshot — this is one scan on one day. A maintainer who bumps tomorrow doesn’t get credit until the next run.
  4. command_injection taint is single-function only. Multi-hop flows (user input → helper → subprocess) are not detected. The 7 hits in cluster.py are a lower bound, not the ceiling.
  5. tool_input_validation is a LOW-severity hygiene check, not a vulnerability. Large counts mostly reflect repo size and tutorial-example density.
  6. Sample is the same 15 popular Python MCP servers as v0.1 — chosen by stars/visibility, not randomly. Don’t generalize to the long tail of less-visible MCPs (which are almost certainly worse on every check).

What’s next

Want something similar for your team? See the Build tier — custom MCP servers, shipped in 5 days, fixed price.