🎯 Launch offer: first 3 clients get 40% off in exchange for a public testimonial — email hello@mcpdone.com with your tier + project.
← All posts

What we learned shipping our first 4 MCP servers

· mcpfastmcpclaude-codelessons-learnedarchitecture

In the last three months we’ve shipped four MCP servers — three public on mcpdone-samples, one internal. They’re deliberately different on purpose: HTTP API integration, local file access, IMAP-based email, OAuth-protected social API. Different shapes, different failure modes, different operational characters.

Most teams ship one MCP server and stop. The marginal lessons from the second, third, and fourth are where the patterns emerge. This post is what surprised us, ranked roughly by how much it surprised us — and what we’d bake into any new MCP server from the first commit.

The four servers, briefly

For context — these come up by name throughout. None of them is novel. Together they cover the practical surface of what teams actually want from MCP integrations.

Total tests across all four: 121, plus the cross-project lint hook (mcp-guardrails) and per-server smoke runs.

Lesson 1: sync vs async at the tool layer is a choice you have to make once, correctly

The hardest-won lesson. We’ve now written and written again about the bug class where a sync @mcp.tool() calls asyncio.run() inside FastMCP’s already-running event loop. Won’t repeat the diagnosis. The lesson, applied across four servers:

The cross-project lint hook doesn’t enforce sync-vs-async — it specifically forbids asyncio.run() inside @mcp.tool() regardless of sync/async. That’s the right surgical target. The choice of “this server is sync” vs “this server is async” should be made once, at scaffold time, and then enforced by project-level convention (e.g., a parametrized inspect.iscoroutinefunction test).

The mistake we made: scaffolded mcp-twitter with sync tool wrappers because the original mental model was “FastMCP wants sync, we’ll just asyncio.run() the inner async work.” Cost: a production bug at first protocol call. Lesson: don’t scaffold sync-with-asyncio.run. Scaffold async-first when the work is async.

Lesson 2: read-only-by-design is harder than it looks

Three of our four servers are read-only by design — meaning the tool surface exposes only read operations, and write operations literally don’t exist in the code path. mcp-twitter can’t post tweets because there’s no post_tweet function anywhere in the package. mcp-sqlite-query can’t execute INSERTs because the SQL validator rejects them AND the driver opens the connection in mode=ro. mcp-content-opportunity can’t write to HN or Reddit because… we don’t even have OAuth for those services.

This is the right shape for any MCP server touching sensitive systems. But it’s not free — we made three specific mistakes building toward it:

The general principle: if your MCP is supposed to be incapable of doing X, write a test that proves the code is structurally incapable. “I didn’t expose a tool for X” is weaker than “this codebase contains no path to X.”

Lesson 3: the biggest cost is not the API tokens

We track API costs carefully — mcp-twitter is $0.005/post + $0.010/user lookup, mcp-content-opportunity hits free tiers (HN public API, Reddit public JSON), mcp-gmail-reader is free (IMAP), mcp-sqlite-query is free (local). Across all four MCPs, our monthly external-API spend is single-digit dollars.

That’s not where the cost lives.

The biggest costs were:

Concrete recommendation: every MCP tool docstring should include (a) what it does in 3+ sentences, (b) the cost per call if non-trivial, (c) the typical use case in the model’s voice (“use this when…”). Length-of-docstring is positively correlated with quality-of-tool-use. We’ve never regretted writing more.

Lesson 4: schema generation is the model’s interface

Related but distinct from Lesson 3: when FastMCP registers your tool, it generates a JSON-Schema description from your function’s signature + docstring + type hints. That schema is what the model sees. Not your code, not your docstring directly — the JSON-Schema derived from them.

Three implications we got wrong before getting right:

The model is reading the schema you didn’t realise you were writing. Make it accurate.

Lesson 5: configuration laziness is the right pattern

Every MCP server needs config: API keys, endpoints, rate limits. The temptation is to load config at module import time, validate, and crash with a clear error if anything’s missing. That’s what most teams do.

We do the opposite: lazy load on first tool call, return a structured error response if config is bad.

_CONFIG_CACHE: Config | None = None


def _cfg() -> Config:
    global _CONFIG_CACHE
    if _CONFIG_CACHE is None:
        _CONFIG_CACHE = load_config()
    return _CONFIG_CACHE


def _error(exc: Exception, hint: str = "") -> dict:
    return {
        "error": type(exc).__name__,
        "message": str(exc),
        "hint": hint,
    }


@mcp.tool()
async def my_tool() -> dict:
    try:
        cfg = _cfg()
        return await _do_work(cfg)
    except ConfigError as e:
        return _error(e, hint="Check products/mcp-twitter/.env credentials.")
    except (ApiError, AuthError) as e:
        return _error(e)

Why this is the right pattern for MCP specifically:

Three of our four servers do this. The one that didn’t (mcp-sqlite-query’s very first revision) crashed on missing DB_PATH env var, and it took us a day of “is this MCP broken?” tickets to add the lazy-load + structured error path. Don’t repeat our mistake.

Lesson 6: the MCP boundary should match the refactor map

This one came from a paid pilot rather than our own internal servers, but it’s the most strategic lesson we’ve learned. An MCP server is a long-lived contract between the model and a system. The system you bind to should be the system you’ll still want to bind to in 12 months.

A common failure mode in client codebases: the team is mid-refactor between an “old layer” and a “new layer.” The old layer has all the data; the new layer is the future. A naive MCP build wires to the old layer (it’s where the data is now) and ships fast. Six months later, the old layer is decommissioned, the MCP breaks, and the team has to rebuild.

The fix is to ask, during intake: “is there a refactor map for this codebase? Old layer being retired, new layer being grown?” If yes, the MCP wires to the new layer — even if it requires the team to add public methods to the new layer first to expose what the MCP needs. The five minutes of “add a public method on the repository” beats the five days of “rebuild the MCP six months later.”

We now bake this into our intake form for the $499 Build tier as a required field: “Refactor map: any old-layer / new-layer splits in the codebase?” Skipping it has cost too much, too often.

Lesson 7: the wrapper-layer is the bug-class blind spot

This is the most operationally important lesson — covered in its own post and the layer-hierarchy post. One-paragraph version:

Your unit tests probably exercise the inner functions of your MCP server (Layer 1 in our taxonomy), with pytest-httpx mocks or similar. They probably don’t exercise the FastMCP wrapper layer (Layer 3) — the part that runs under the protocol. The bugs that hide in Layer 3 are the worst kind: invisible to tests, invisible to type-checkers, visible only at first protocol call. The cheapest mitigation is a parametrized test using inspect.iscoroutinefunction plus an AST scan banning asyncio.run() inside @mcp.tool() bodies. Both are offline, both are instant, both catch the entire bug class. The cross-project lint hook at mcp-guardrails is the AST scan generalised to any FastMCP project.

Add it. Today. Before the next deployment.

What we’d do differently

If we were building all four servers fresh, knowing what we know now:

  1. Pick sync vs async at scaffold time, by what the work actually is. Add a project-level test asserting the choice from day one. Never mix.
  2. Treat docstrings as the model’s interface. Three sentences minimum. Cost per call. Typical use case. Re-read after every behaviour change.
  3. Read-only-by-design enforced structurally. Not “we didn’t expose a write tool” — the codebase contains no path to write at all. Test it.
  4. Lazy config loading, structured error envelopes. Never import-time crashes.
  5. Wrapper-layer tests from the first commit. inspect.iscoroutinefunction + AST scan. Never optional.
  6. Live smoke test at the project root. Run before any commit touching server.py. Cheap insurance.
  7. For client engagements: ask about the refactor map. Don’t wire to the old layer.

This list is now the SOP for any MCP server we ship — internal or paid. It’s not glamorous, and most of it isn’t visible to the end user. But it’s the difference between an MCP that survives in production and one that dies the first time someone outside the original author’s setup tries to use it.

The general lesson

MCP servers are a protocol, not a framework. The protocol has surprising semantics (sync tools blocking the loop, schema generation from signatures, structural error semantics) that don’t fall out of “it’s just Python.” Treating an MCP server like any other Python package — same testing, same architecture, same shipping discipline — gets you to a demo. Treating it like a long-lived protocol contract, with the specific test layers and the specific guard patterns, gets you to production.

We learned each of these lessons by shipping a bug. That’s the expensive curriculum. The repo and the lint hook are the cheap version.


All four MCP servers are open-source on the mcpdone-samples repo (the public three; mcp-twitter is in our internal repo but the patterns are documented in our blog). The cross-project lint that enforces these patterns is at mcp-guardrails. MIT-licensed.

If your team wants someone to build production-shape MCP servers with these patterns baked in from day one — not learned by shipping bugs — that’s the $499 Build tier. Money-back if the code doesn’t run in a clean environment.

Want something similar for your team? See the Build tier — custom MCP servers, shipped in 5 days, fixed price.