What we learned shipping our first 4 MCP servers

6 May 2026 · mcpfastmcpclaude-codelessons-learnedarchitecture

In the last three months we’ve shipped four MCP servers — three public on mcpdone-samples, one internal. They’re deliberately different on purpose: HTTP API integration, local file access, IMAP-based email, OAuth-protected social API. Different shapes, different failure modes, different operational characters.

Most teams ship one MCP server and stop. The marginal lessons from the second, third, and fourth are where the patterns emerge. This post is what surprised us, ranked roughly by how much it surprised us — and what we’d bake into any new MCP server from the first commit.

The four servers, briefly

For context — these come up by name throughout. None of them is novel. Together they cover the practical surface of what teams actually want from MCP integrations.

mcp-content-opportunity — searches HN + Reddit, ranks discussions by 5 explainable heuristics. HTTP API integration. Read-only. Stateless.
mcp-sqlite-query — read-only SQLite query tool with two independent safety layers (SQL-shape check + driver-level mode=ro). Wraps a sync library (sqlite3).
mcp-gmail-reader — reads + drafts on a single Gmail label, with a separate sender module for actual sends and a FORBIDDEN_NAMES env-var guard. Wraps a sync library (imaplib).
mcp-twitter — read-only Twitter MCP, OAuth 2.0 client_credentials, fully async via httpx.

Total tests across all four: 121, plus the cross-project lint hook (mcp-guardrails) and per-server smoke runs.

Lesson 1: sync vs async at the tool layer is a choice you have to make once, correctly

The hardest-won lesson. We’ve now written and written again about the bug class where a sync @mcp.tool() calls asyncio.run() inside FastMCP’s already-running event loop. Won’t repeat the diagnosis. The lesson, applied across four servers:

Tools that drive async work (HTTP, async DB drivers): always async def. No exceptions. The temptation to write a sync tool that wraps async work via asyncio.run() is exactly the trap. mcp-content-opportunity and mcp-twitter are fully async at the tool layer.
Tools that wrap sync libraries (sqlite3, imaplib): always def. FastMCP runs sync tools in a thread executor; that’s the right shape. Trying to await asyncio.to_thread(sync_call) adds ceremony with no gain — your tool isn’t actually doing async work, it’s pretending to. mcp-sqlite-query and mcp-gmail-reader are fully sync at the tool layer.

The cross-project lint hook doesn’t enforce sync-vs-async — it specifically forbids asyncio.run() inside @mcp.tool() regardless of sync/async. That’s the right surgical target. The choice of “this server is sync” vs “this server is async” should be made once, at scaffold time, and then enforced by project-level convention (e.g., a parametrized inspect.iscoroutinefunction test).

The mistake we made: scaffolded mcp-twitter with sync tool wrappers because the original mental model was “FastMCP wants sync, we’ll just asyncio.run() the inner async work.” Cost: a production bug at first protocol call. Lesson: don’t scaffold sync-with-asyncio.run. Scaffold async-first when the work is async.

Lesson 2: read-only-by-design is harder than it looks

Three of our four servers are read-only by design — meaning the tool surface exposes only read operations, and write operations literally don’t exist in the code path. mcp-twitter can’t post tweets because there’s no post_tweet function anywhere in the package. mcp-sqlite-query can’t execute INSERTs because the SQL validator rejects them AND the driver opens the connection in mode=ro. mcp-content-opportunity can’t write to HN or Reddit because… we don’t even have OAuth for those services.

This is the right shape for any MCP server touching sensitive systems. But it’s not free — we made three specific mistakes building toward it:

Underestimating the temptation to add “just one write.” Halfway through mcp-gmail-reader, the user wanted reply-drafting. The natural thing was a send_reply tool. We resisted, built draft_reply (writes to Drafts folder, not Sent), and added a separate mcp_gmail_reader.sender module for actual sends with a FORBIDDEN_NAMES guard. Two weeks later, we did add a send_reply MCP tool — but it lives behind explicit env-var enable + a hard rate-limit + the FORBIDDEN_NAMES filter. The “default off, opt-in by env” pattern is the right shape for tools with real-world consequences.
Conflating “read-only” with “safe.” A read-only Gmail tool can still leak secrets if the model dumps the wrong email body into a public channel. Read-only means the tool doesn’t write back to the system; it doesn’t mean the output is harmless. We added scope-locking (label-only) on top of read-only to bound what the model can see at all.
Forgetting that tests prove negative claims poorly. “There’s no write capability in this MCP” is a property of the whole package, not just the tools. A test that asserts the tool surface is read-only ≠ a test that asserts the package contains no write code. We added an AST scan that fails CI if imaplib’s STORE or EXPUNGE commands appear anywhere in mcp-gmail-reader’s source — the explicit “this code is structurally incapable of writing” assertion.

The general principle: if your MCP is supposed to be incapable of doing X, write a test that proves the code is structurally incapable. “I didn’t expose a tool for X” is weaker than “this codebase contains no path to X.”

Lesson 3: the biggest cost is not the API tokens

We track API costs carefully — mcp-twitter is $0.005/post + $0.010/user lookup, mcp-content-opportunity hits free tiers (HN public API, Reddit public JSON), mcp-gmail-reader is free (IMAP), mcp-sqlite-query is free (local). Across all four MCPs, our monthly external-API spend is single-digit dollars.

That’s not where the cost lives.

The biggest costs were:

Operational time spent on bugs we shipped before the mcp-guardrails lint existed. Diagnosis, fix, retest, redeploy: ~3 hours. Multiply across four servers if you also haven’t run that lint on yours.
The cost of getting the docstring contract wrong. MCP tool docstrings are the primary interface the model sees. A vague or incomplete docstring means the model calls the tool with wrong arguments, the call fails, the user is confused. We learned to treat docstrings as prompts (see Lesson 4) — but only after multiple “why is the model calling this tool with garbage args” sessions.
The cost of getting cost transparency wrong. Tools that don’t surface their per-call cost in their docstrings get called liberally and rack up bills. Tools that DO surface cost ("costs $0.005 per post returned + $0.010 for user lookup" directly in the docstring) get called more thoughtfully — the model self-rate-limits when it sees a cost signal.

Concrete recommendation: every MCP tool docstring should include (a) what it does in 3+ sentences, (b) the cost per call if non-trivial, (c) the typical use case in the model’s voice (“use this when…”). Length-of-docstring is positively correlated with quality-of-tool-use. We’ve never regretted writing more.

Lesson 4: schema generation is the model’s interface

Related but distinct from Lesson 3: when FastMCP registers your tool, it generates a JSON-Schema description from your function’s signature + docstring + type hints. That schema is what the model sees. Not your code, not your docstring directly — the JSON-Schema derived from them.

Three implications we got wrong before getting right:

Type hints aren’t optional. A parameter typed as Any produces a schema field with no type constraint, and the model passes whatever it feels like. We had a since_hours: Any slip in a draft of mcp-twitter that resulted in the model passing string "24h" instead of int 24. Type hints aren’t decoration — they’re documentation the model reads.
Default values matter for cost control. A limit: int = 100 parameter means the model defaults to 100 results unless it has a reason not to. A limit: int = 20 defaults to a fifth of that. We tuned defaults based on the cost-per-result, not on what felt like a “round number.” The model respects defaults; it doesn’t optimise costs unless asked.
Optional vs required matters for schema strictness. def tool(query: str, limit: int = 20) makes limit optional; the model can omit it. def tool(query: str, limit: int) requires it; the model must provide it. The schema is generated automatically, but you control its strictness via your function signature. Use that.

The model is reading the schema you didn’t realise you were writing. Make it accurate.

Lesson 5: configuration laziness is the right pattern

Every MCP server needs config: API keys, endpoints, rate limits. The temptation is to load config at module import time, validate, and crash with a clear error if anything’s missing. That’s what most teams do.

We do the opposite: lazy load on first tool call, return a structured error response if config is bad.

_CONFIG_CACHE: Config | None = None


def _cfg() -> Config:
    global _CONFIG_CACHE
    if _CONFIG_CACHE is None:
        _CONFIG_CACHE = load_config()
    return _CONFIG_CACHE


def _error(exc: Exception, hint: str = "") -> dict:
    return {
        "error": type(exc).__name__,
        "message": str(exc),
        "hint": hint,
    }


@mcp.tool()
async def my_tool() -> dict:
    try:
        cfg = _cfg()
        return await _do_work(cfg)
    except ConfigError as e:
        return _error(e, hint="Check products/mcp-twitter/.env credentials.")
    except (ApiError, AuthError) as e:
        return _error(e)

Why this is the right pattern for MCP specifically:

MCP servers run as subprocesses spawned by Claude Code. A server that crashes at import time is a server Claude Code can’t surface a useful error from. The user sees “MCP server failed to start” with no context.
Lazy + structured errors gives the model something to repair. When the tool call returns {"error": "AuthError", "message": "401 Unauthorized", "hint": "Check products/mcp-twitter/.env credentials."}, Claude Code shows it to the user, the user updates their .env, no restart needed.
It separates “config is missing” from “the API is down.” Both are recoverable; the recovery action is different. Eager validation conflates them (“server crashed”) and forces the user to read logs.

Three of our four servers do this. The one that didn’t (mcp-sqlite-query’s very first revision) crashed on missing DB_PATH env var, and it took us a day of “is this MCP broken?” tickets to add the lazy-load + structured error path. Don’t repeat our mistake.

Lesson 6: the MCP boundary should match the refactor map

This one came from a paid pilot rather than our own internal servers, but it’s the most strategic lesson we’ve learned. An MCP server is a long-lived contract between the model and a system. The system you bind to should be the system you’ll still want to bind to in 12 months.

A common failure mode in client codebases: the team is mid-refactor between an “old layer” and a “new layer.” The old layer has all the data; the new layer is the future. A naive MCP build wires to the old layer (it’s where the data is now) and ships fast. Six months later, the old layer is decommissioned, the MCP breaks, and the team has to rebuild.

The fix is to ask, during intake: “is there a refactor map for this codebase? Old layer being retired, new layer being grown?” If yes, the MCP wires to the new layer — even if it requires the team to add public methods to the new layer first to expose what the MCP needs. The five minutes of “add a public method on the repository” beats the five days of “rebuild the MCP six months later.”

We now bake this into our intake form for the $499 Build tier as a required field: “Refactor map: any old-layer / new-layer splits in the codebase?” Skipping it has cost too much, too often.

This is the most operationally important lesson — covered in its own post and the layer-hierarchy post. One-paragraph version:

Your unit tests probably exercise the inner functions of your MCP server (Layer 1 in our taxonomy), with pytest-httpx mocks or similar. They probably don’t exercise the FastMCP wrapper layer (Layer 3) — the part that runs under the protocol. The bugs that hide in Layer 3 are the worst kind: invisible to tests, invisible to type-checkers, visible only at first protocol call. The cheapest mitigation is a parametrized test using inspect.iscoroutinefunction plus an AST scan banning asyncio.run() inside @mcp.tool() bodies. Both are offline, both are instant, both catch the entire bug class. The cross-project lint hook at mcp-guardrails is the AST scan generalised to any FastMCP project.

Add it. Today. Before the next deployment.

What we’d do differently

If we were building all four servers fresh, knowing what we know now:

Pick sync vs async at scaffold time, by what the work actually is. Add a project-level test asserting the choice from day one. Never mix.
Treat docstrings as the model’s interface. Three sentences minimum. Cost per call. Typical use case. Re-read after every behaviour change.
Read-only-by-design enforced structurally. Not “we didn’t expose a write tool” — the codebase contains no path to write at all. Test it.
Lazy config loading, structured error envelopes. Never import-time crashes.
Wrapper-layer tests from the first commit. inspect.iscoroutinefunction + AST scan. Never optional.
Live smoke test at the project root. Run before any commit touching server.py. Cheap insurance.
For client engagements: ask about the refactor map. Don’t wire to the old layer.

This list is now the SOP for any MCP server we ship — internal or paid. It’s not glamorous, and most of it isn’t visible to the end user. But it’s the difference between an MCP that survives in production and one that dies the first time someone outside the original author’s setup tries to use it.

The general lesson

MCP servers are a protocol, not a framework. The protocol has surprising semantics (sync tools blocking the loop, schema generation from signatures, structural error semantics) that don’t fall out of “it’s just Python.” Treating an MCP server like any other Python package — same testing, same architecture, same shipping discipline — gets you to a demo. Treating it like a long-lived protocol contract, with the specific test layers and the specific guard patterns, gets you to production.

We learned each of these lessons by shipping a bug. That’s the expensive curriculum. The repo and the lint hook are the cheap version.

All four MCP servers are open-source on the mcpdone-samples repo (the public three; mcp-twitter is in our internal repo but the patterns are documented in our blog). The cross-project lint that enforces these patterns is at mcp-guardrails. MIT-licensed.

If your team wants someone to build production-shape MCP servers with these patterns baked in from day one — not learned by shipping bugs — that’s the $499 Build tier. Money-back if the code doesn’t run in a clean environment.