How We Built an MCP Server with 229 Tools (Without Writing a Single Tool Definition)

How Apideck auto-generated a 229-tool MCP server from an OpenAPI spec using Speakeasy, deployed on Vercel with dynamic tool discovery at 1,300 tokens. A walkthrough of the stack, the hosting tradeoffs, and the hard-won lessons from shipping serverless analytics.

GJGJ

GJ

20 min read
How We Built an MCP Server with 229 Tools (Without Writing a Single Tool Definition)

TL;DR: We auto-generated a production MCP server from our OpenAPI spec using Speakeasy, added PostHog analytics that survives Vercel's serverless lifecycle, and deployed the whole thing on Vercel with zero infrastructure management. This is a walkthrough of the stack, why we chose each piece, and the hard-won lessons from shipping it.


In our last post, we showed how MCP servers eat context windows (55,000+ tokens before an agent reads a single message) and made the case for CLIs as a lighter alternative.

But CLIs don't solve everything. Every major agent framework connects to tools via MCP: OpenAI Agents SDK, Pydantic AI, LangChain, Google ADK. If you want your API to work across all of them, you need an MCP server. We wrote about the tradeoffs between APIs, MCPs, or both previously, and for multi-framework compatibility, MCP wins.

The challenge is different for us than for single-vendor API companies. Xero ships an MCP server with ~50 tools for one accounting system. Stripe ships a curated set of tools for one payments system. They hand-build their tool definitions and move on.

Apideck is a Unified API platform. One integration gives developers access to 20+ accounting systems (QuickBooks, Xero, NetSuite, Sage), 20+ HRIS platforms, file storage providers, and more through a single API. That means our MCP server doesn't expose "QuickBooks invoices" or "Xero invoices." It exposes "accounting invoices," and the right connector fires based on which integration the consumer has authorized through Vault.

That architecture has a direct consequence: our tool surface is massive. 229 operations across accounting, file storage, HRIS, Vault, and a proxy API. Hand-building that many tool definitions is a maintenance nightmare. And when we add a new connector or API group, every tool definition would need updating. So we generated the whole thing from the OpenAPI spec.

The Stack

ComponentChoiceWhy
MCP generationSpeakeasy mcp-typescript targetAuto-generates 229 tools from our OpenAPI spec
HostingVercel serverlessAuto-scaling, git-push deploys, waitUntil for background work
AnalyticsPostHog (EU) via /batch APIServer-side event tracking without the browser SDK
RuntimeNode.js + @modelcontextprotocol/sdkStandard MCP protocol implementation
Schema validationZod v4Generated input schemas, JSON Schema conversion
BuildBunFast bundling of 229 tool files into a single entry point

Speakeasy: From OpenAPI to Auto-Generated MCP Tools in One Command

We already used Speakeasy for SDK generation. Our OpenAPI spec has 986 x-speakeasy extensions for cleaner naming and grouping. Generating the MCP server from the same spec was the obvious move:

speakeasy run

That's it. One command generates:

  • 229 tool definitions with names like accounting-invoices-list, hris-employees-create
  • Zod input schemas for every operation
  • MCP annotations (readOnlyHint, destructiveHint, idempotentHint)
  • An SDK client (ApideckMcpCore) that handles auth, headers, and HTTP transport
  • CLI commands for local stdio and HTTP serving

Each generated tool is a thin wrapper around an SDK function:

// Auto-generated: src/mcp-server/tools/accountingInvoicesList.ts
export const tool$accountingInvoicesList = {
  name: "accounting-invoices-list",
  description: "List Invoices",
  scopes: ["read"],
  annotations: {
    title: "List Invoices",
    readOnlyHint: true,
    destructiveHint: false,
  },
  args: { request: z.object({ limit: z.number().optional(), /* ... */ }) },
  tool: async (client, args, ctx) => {
    const [result] = await accountingInvoicesList(
      client, args.request, { fetchOptions: { signal: ctx.signal } }
    ).$inspect();

    if (!result.ok) {
      return { content: [{ type: "text", text: result.error.message }], isError: true };
    }
    return formatResult(result.value);
  },
};

No hand-written tool definitions. When the API spec changes, speakeasy run regenerates everything.

Controlling What Gets Generated

We don't expose all 360 Apideck operations. A Python script generates a Speakeasy overlay that selects which API groups become MCP tools:

python generate-overlay.py accounting,fileStorage,hris,vault,proxy  # 229 tools
python generate-overlay.py accounting                                # 143 tools

The overlay uses JSONPath to disable paths and annotate operations with scopes:

# mcp-overlay.yaml (generated)
overlay: 1.0.0
actions:
  - target: $.paths["/crm/*"].*
    update:
      x-speakeasy-mcp:
        disabled: true    # CRM excluded
  - target: $.paths["/accounting/invoices"].get
    update:
      x-speakeasy-mcp:
        scope: read       # GET = read scope

Post-Generation Fixes

Speakeasy's output needs two patches after each generation. We automate this with post-generate.sh:

  1. Zod transforms break JSON Schema conversion. Speakeasy generates Zod schemas with .transform() calls. When the describe_tool_input meta-tool converts these to JSON Schema, Zod v4 throws. The fix: add unrepresentable: "any" to render transforms as {} instead of throwing.

  2. Dynamic mode should be the default. We want agents to start with 4 meta-tools, not 229. A one-line sed patch sets --mode dynamic as the default.

speakeasy run && ./post-generate.sh  # Full regeneration pipeline

Dynamic Tool Discovery: 229 Tools at 1,300 Tokens

Dynamic tool discovery is the design decision that makes the whole thing work. Instead of dumping 229 tool schemas into the agent's context (25-40K tokens), we expose 4 meta-tools:

Meta-toolPurposeTokens
list_toolsSearch/filter available tools~100-500 per call
describe_tool_inputGet JSON Schema for a specific tool~200-800 per call
execute_toolRun any tool by namevaries
list_scopesList available permission scopes~50

An agent workflow looks like:

Agent: list_tools(search_terms: ["invoices"])
-> Returns: accounting-invoices-list, accounting-invoices-create, ...

Agent: describe_tool_input(tool_names: ["accounting-invoices-list"])
-> Returns: { limit: number, cursor: string, ... }

Agent: execute_tool(tool_name: "accounting-invoices-list", input: { limit: 10 })
-> Returns: invoice data

Initial cost: ~1,300 tokens. The agent discovers what it needs, when it needs it.

Single-vendor MCP servers can get away with static mode. Xero's ~50 tools cost maybe 15K tokens. Stripe's tools? Barely a dent. But a multi-vendor Unified API covering accounting, HRIS, file storage, vault management, and a proxy layer (229 operations today, growing as we add API groups) would burn 25-40K tokens in static mode before the agent reads a single message. Add CRM and ATS, and you're past 300 tools.

Dynamic mode decouples API surface size from token cost. Whether we expose 50 tools or 500, the initial cost stays at ~1,300 tokens. That's what makes a Unified API viable as an MCP server at all.

It also changes how agents interact with multi-vendor APIs. An agent building a financial report doesn't need to know it's talking to QuickBooks vs Xero vs NetSuite. It calls list_tools(search_terms: ["accounting", "invoices"]), discovers the tools, and executes. The Unified API handles connector routing behind the scenes based on the consumer's Vault configuration.

Multi-Agent Mode: One Server, Specialized Agents

A single MCP server with 229 tools works for general-purpose agents using dynamic discovery. But in production, you often want specialized agents — an AP agent that only handles payables, a reconciliation agent that's read-only, an onboarding agent that manages Vault connections.

The server supports this out of the box with three filtering mechanisms that let you spin up purpose-built MCP instances from the same codebase.

Filter by Tool Name

The --tool flag mounts only specific tools:

# AP agent: only invoice and bill operations
node bin/mcp-server.js start \
  --tool accounting-invoices-list \
  --tool accounting-invoices-create \
  --tool accounting-bills-list \
  --tool accounting-bills-create \
  --tool accounting-payments-create

The agent sees exactly 5 tools. No discovery overhead, no risk of it wandering into HRIS or file storage operations.

Filter by Scope

The --scope flag restricts by operation type:

# Read-only reporting agent
node bin/mcp-server.js start --scope read

# Read + write, but can never delete
node bin/mcp-server.js start --scope read --scope write

A --scope read agent gets 95 tools (every GET operation across all APIs) with zero write or delete tools available. This isn't a prompt instruction the agent might ignore — the tools literally don't exist in its MCP session.

Filter by Annotation

The --tool-annotations flag filters on MCP annotations like readOnly, destructive, idempotent, and openWorld:

# Only safe, idempotent operations — good for retry-heavy workflows
node bin/mcp-server.js start --tool-annotations readOnly,idempotent

Combining Filters for Agent Teams

These filters compose. A multi-agent setup might look like:

# Agent 1: Financial analyst — read-only accounting data
analyst_mcp = MCPServerStreamableHttp(
    url="https://mcp.apideck.dev/mcp",
    # Or self-hosted with: --scope read --mode static
)

# Agent 2: AP processor — can create invoices and payments, nothing else
ap_mcp = MCPServerStreamableHttp(
    url="https://internal:3001/mcp",
    # Started with: --tool accounting-invoices-create --tool accounting-payments-create
)

# Agent 3: Admin — manages Vault connections for onboarding
admin_mcp = MCPServerStreamableHttp(
    url="https://internal:3002/mcp",
    # Started with: --scope read --scope write --tool vault-connections-list ...
)

Each agent gets the minimum tool surface it needs. The analyst can't create invoices. The AP processor can't delete anything. The admin can't touch financial data. Structural enforcement, not prompt-based trust.

Real-World Example: AP Automation in Read-Only Mode

Here's what this looks like in practice. Our AP Automation agent processes vendor invoices end-to-end — supplier lookup, PO matching, duplicate checks, GL coding, and bill creation. When running in read-only mode, the agent completes every analysis step but correctly stops at the write boundary:

AP Automation Agent

The agent identifies that a supplier (LinkedIn Ireland Unlimited Company) needs to be created and a bill needs to be entered — but instead of failing silently or hallucinating a success, it reports exactly what's blocked and why:

  • Duplicate Check: Passed — no existing bill with this invoice number
  • Supplier: Needs to be created (blocked by read-only mode)
  • GL Coding: Appropriate expense account identified (ID: 2)
  • Bill Creation: Ready to proceed once supplier is created (blocked by read-only mode)

The agent then recommends next steps for when permissions are escalated: create the supplier, create the bill, review and approve for payment by the due date.

This is the multi-agent sweet spot. A read-only agent does the analysis, flags what needs human approval or elevated permissions, and a write-enabled agent (or human) acts on the recommendations. The MCP server's scope system makes this workflow structural — the read-only agent literally cannot create a supplier, no matter what the prompt says.

For the hosted Vercel deployment, you can achieve the same effect programmatically — the createMCPServer factory accepts allowedTools, scopes, and annotationFilter parameters:

const { server } = createMCPServer({
  logger,
  analytics,
  dynamic: true,
  getSDK,
  allowedTools: ["accounting-invoices-list", "accounting-invoices-create"],
  scopes: ["read", "write"],
});

This means you can build a routing layer that creates different MCP server configurations per agent role, all from a single deployment.

Why Vercel (Not Cloudflare, Not Self-Hosted)

Choosing where to host an MCP server matters more than you'd think. MCP over HTTP uses StreamableHTTPServerTransport, essentially Server-Sent Events (SSE) over POST requests. The hosting platform needs to handle streaming responses, reasonable cold starts, and a way to do background work after the response is sent.

We evaluated three options.

Cloudflare Workers: Great Runtime, Wrong Limits

Speakeasy generates a Cloudflare Workers deployment out of the box, complete with Durable Objects for session state. It's an appealing target: edge-deployed, fast cold starts, generous free tier.

The problem: Cloudflare's free plan CPU time limit is too tight for initializing 229 tools and their Zod schemas on cold start. The paid plan works, but we didn't want to force a paid runtime for an open-source project. We keep the wrangler.toml in the repo as an option, but Vercel is the default.

Self-Hosted Express: Simple, But You're the SRE

The generated code includes an Express-based HTTP server with SSE transport:

// node bin/mcp-server.js serve --port 3000
const app = express();
app.post("/mcp", async (req, res) => {
  const transport = new StreamableHTTPServerTransport({});
  const { server } = createMCPServer({ logger, analytics, dynamic: true, getSDK });
  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});
app.listen(3000);

This works great for local development and self-hosted deploys on Railway or Fly.io. But for a public hosted endpoint, you're managing uptime, scaling, TLS, and monitoring yourself.

Vercel: The Right Abstraction for a Multi-Tenant MCP Server

Vercel turned out to be the best fit for several reasons.

Git-push deploys. Merge to main, production updates automatically. Preview deployments on every PR let us test the MCP server end-to-end before going live. This matters when you're iterating on analytics or debugging serverless behavior.

Per-request isolation. Each Vercel function invocation is stateless. For an MCP server, this is a feature: every request creates a fresh createMCPServer() instance with credentials from the request headers. No shared state, no session leaks between tenants.

Fluid Compute. Vercel reuses function instances across concurrent requests, which means the 229-tool initialization cost is paid once and amortized across subsequent requests hitting the same instance. Cold starts are real but infrequent.

waitUntil for background work. This is what makes our PostHog integration possible. More on this below, but waitUntil from @vercel/functions is the only reason server-side analytics work at all in this setup. Without it, every analytics event would be dropped.

Environment management. vercel env add POSTHOG_API_KEY production gives you separate values per environment (production, preview, development), scoped to git branches if needed. No .env files committed to the repo, no secrets in CI config.

The entire Vercel handler is ~40 lines:

// api/mcp.ts
import { waitUntil } from "@vercel/functions";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";

const logger = createConsoleLogger("info");

function getAnalytics() {
  return createAnalytics(process.env["POSTHOG_API_KEY"], logger, waitUntil);
}

export default async function handler(req, res) {
  res.setHeader("Access-Control-Allow-Origin", "*");
  res.setHeader("Access-Control-Allow-Methods", "GET, POST, DELETE, OPTIONS");
  res.setHeader("Access-Control-Allow-Headers", "Content-Type, *");

  if (req.method === "OPTIONS") {
    res.statusCode = 204;
    return res.end();
  }

  // Extract per-request credentials from headers
  const getSDK = () => new ApideckMcpCore({
    security: async () => ({
      apiKey: headers.get("x-apideck-api-key") || process.env["APIDECK_API_KEY"]
    }),
    consumerId: headers.get("x-apideck-consumer-id"),
    appId: headers.get("x-apideck-app-id"),
  });

  const analytics = getAnalytics();
  const transport = new StreamableHTTPServerTransport({});
  const { server } = createMCPServer({ logger, analytics, dynamic: true, getSDK });

  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
}

Multi-tenancy comes free: each request passes its own Apideck credentials via headers. The same deployment serves every customer. No session management, no connection pools, no tenant routing logic.

This is particularly important for a multi-tenant MCP server built on a Unified API. Each consumer might have different connectors authorized (one has QuickBooks, another has Xero, a third has NetSuite). The consumer-id header tells the Unified API which consumer's connections to use, and the x-apideck-service-id field in tool inputs can target a specific connector:

// Same MCP server, different consumers, different accounting systems
execute_tool({
  tool_name: "accounting-invoices-list",
  input: { request: { limit: 10, xApideckServiceId: "quickbooks" } }
})

PostHog Analytics in Serverless (The Hard Part)

We wanted to track which tools agents actually call: tool name, duration, error rate, static vs dynamic mode. PostHog's /batch API works for server-side tracking without the browser SDK. Simple enough, right?

It took more deploys than we expected to get this working. Three attempts, in order.

Attempt 1: Fire-and-Forget (Broken)

// This doesn't work in serverless
async capture(event) {
  fetch("https://eu.i.posthog.com/batch", {
    method: "POST",
    body: JSON.stringify(payload),
  });
  // Function exits, fetch gets killed
}

Vercel terminates the function after res.end(). Any in-flight fetch calls get killed with no errors and no logs. Just missing events.

Attempt 2: Buffer and Flush (Also Broken)

// flush() never runs
const buffer = [];
capture(event) { buffer.push(event); }
flush() { await sendBatch(buffer); }

// In the handler:
await transport.handleRequest(req, res, req.body);
await analytics.flush();  // Vercel already killed us after res.end()

The MCP StreamableHTTPServerTransport calls res.end() internally when it sends the SSE response. Vercel sees the response as done and freezes the function. flush() never executes.

Attempt 3: waitUntil (Works)

// Keeps the function alive for background work
import { waitUntil } from "@vercel/functions";

export function createAnalytics(apiKey, logger, onBackground) {
  const pending = [];

  function send(event) {
    const payload = {
      api_key: apiKey,
      batch: [{
        event: event.event,
        distinct_id: event.distinctId,
        properties: { ...event.properties, $lib: "apideck-mcp" },
        timestamp: new Date().toISOString(),
      }],
    };

    return fetch("https://eu.i.posthog.com/batch", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify(payload),
    });
  }

  return {
    async capture(event) {
      const p = send(event);
      if (onBackground) onBackground(p);  // Register with waitUntil
      pending.push(p);
    },
    async flush() {
      await Promise.all(pending.splice(0));
    },
  };
}

// Wire it up:
const analytics = createAnalytics(key, logger, waitUntil);

waitUntil from @vercel/functions tells Vercel: "the response is sent, but keep this function alive until these promises resolve." Each capture() registers its PostHog fetch with waitUntil, and Vercel waits for delivery before freezing the instance.

The key insight: you can't schedule background work after handleRequest, because by then the function is dead. You need to register promises as they're created inside the tool handler. The onBackground callback pattern lets the analytics module stay platform-agnostic while the Vercel handler passes waitUntil as the implementation.

What We Track

await analytics?.capture({
  distinctId: "mcp-server",
  event: "mcp_tool_called",
  properties: {
    tool_name: "accounting-invoices-list",
    is_error: false,
    duration_ms: 245,
    mode: "dynamic",
  },
});

Every execute_tool call captures tool name, execution time, error status, and mode. This tells us which tools agents actually use, which ones fail, and how dynamic vs static mode affects usage patterns.

The analytics interface degrades gracefully: if POSTHOG_API_KEY isn't set, createAnalytics returns a no-op stub. No conditional checks scattered through the codebase. The tool handlers always call analytics?.capture(), and the implementation decides whether to actually send.

Scopes: Structural Permissions Without Prompts

Every tool is annotated with a scope based on its HTTP method:

ScopeHTTP MethodsWhat it means
readGET, HEADSafe, no side effects
writePOST, PUT, PATCHCreates or modifies data
destructiveDELETEIrreversible

Agents can be restricted at startup:

# Read-only agent - can't create or delete anything
node bin/mcp-server.js start --scope read

# Read + write, no deletes
node bin/mcp-server.js start --scope read --scope write

This is enforced at the tool registration level, not via prompts. A read-only agent literally doesn't have delete tools available. There's nothing to misuse.

We covered this philosophy in our previous post on MCP context windows: telling an agent "never delete production data" in a system prompt is like putting a sticky note on the nuclear launch button. Structural safety beats prompt-based safety. For a deeper look at the security landscape of MCP, we published a separate analysis covering auth, injection, and trust boundaries.

Works Everywhere

The same createMCPServer() factory powers every deployment target. The relationship between MCP and REST APIs is complementary, and the protocol's transport flexibility means a single server implementation covers stdio, HTTP, and SSE.

Claude Desktop / Cursor (stdio):

{
  "mcpServers": {
    "apideck": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcpb", "--from", "https://mcp.apideck.dev/mcp-server.mcpb"],
      "env": {
        "APIDECK_API_KEY": "your-key",
        "APIDECK_CONSUMER_ID": "your-consumer",
        "APIDECK_APP_ID": "your-app"
      }
    }
  }
}

OpenAI Agents SDK:

from agents import Agent
from agents.mcp import MCPServerStreamableHttp

async with MCPServerStreamableHttp(
    url="https://mcp.apideck.dev/mcp",
    headers={
        "x-apideck-api-key": "your-key",
        "x-apideck-consumer-id": "your-consumer",
        "x-apideck-app-id": "your-app",
    },
) as server:
    agent = Agent(name="Accounting Agent", mcp_servers=[server])

Self-hosted HTTP:

node bin/mcp-server.js serve --port 3000 --mode dynamic

One codebase, one createMCPServer() call, every framework.

What We'd Do Differently

Start with waitUntil from day one. We wasted hours debugging PostHog events disappearing in serverless. If you're doing any background I/O on Vercel (analytics, logging, webhooks), reach for @vercel/functions immediately. The serverless lifecycle will surprise you.

Test on the actual deployment target. Our analytics worked perfectly in local stdio mode and broke on Vercel with no error output. The StreamableHTTPServerTransport calls res.end() internally, which changes the function lifecycle in ways you won't catch locally. Preview deployments on every PR commit made this debuggable.

Watch for trailing newlines in env vars. Our production PostHog key had a trailing \n from a copy-paste. The PostHog API rejected it with no error, no events. We spent an hour debugging code when the fix was re-pasting the env var. Always vercel env ls and verify.

Dynamic mode should have been the only mode for hosted. Static mode makes sense for local agents with small tool sets. For a hosted multi-tenant server with 229 tools, dynamic discovery is the only sane option. We should have made that assumption earlier.

How This Compares

CompanyMCP ServerToolsApproach
XeroXeroAPI/xero-mcp-server~50Hand-built, single vendor
Stripestripe/ai~19Hand-built, curated
Intuit/QuickBooksintuit/quickbooks-online-mcp-server~12Hand-built, single vendor
PlaidDashboard + Sandbox MCP~8Hand-built, operations only
Apideckapideck-libraries/mcp229Auto-generated, multi-vendor, dynamic discovery

The difference isn't tool count. It's what happens when the API surface grows. Xero adding 10 endpoints means hand-updating 10 tool definitions. Apideck adding an entire API group (say, ecommerce) means running speakeasy run and updating the overlay. The tool definitions, schemas, and annotations generate themselves.

The Numbers

MetricValue
Tools generated229
Lines of hand-written tool code0
Token cost (dynamic mode)~1,300
Token cost (static mode)~25-40K
Deployment targets3 (Vercel, Cloudflare Workers, local)
Time to regenerate from specspeakeasy run && ./post-generate.sh (~60s)
Frameworks supportedAll (via MCP protocol)

Try It

The server is live at https://mcp.apideck.dev/mcp. The code is open source at github.com/apideck-libraries/mcp.

# Quick test with curl
curl -X POST https://mcp.apideck.dev/api/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize",
       "params":{"protocolVersion":"2025-03-26","capabilities":{},
       "clientInfo":{"name":"test","version":"1.0"}}}'

Or connect it to any MCP-compatible agent framework and let dynamic discovery do its thing.

Next up: we're looking at adding ecommerce and CRM API groups to the MCP server, which would push the tool count past 300. Dynamic mode means that won't cost a single extra token at initialization. We're also exploring how agent usage data from PostHog can inform which tool groups to prioritize and which operations to optimize for lower latency. And as AI reshapes the accounting stack, the intersection of agentic workflows and unified accounting APIs is where we see the most traction. If you're building auto-generated MCP tools from an OpenAPI spec, open an issue on the repo and let us know what you're running into.

Further Reading

Ready to get started?

Scale your integration strategy and deliver the integrations your customers need in record time.

Ready to get started?
Talk to an expert

Trusted by fast-moving product & engineering teams

JobNimbus
Blue Zinc
Drata
Octa
Nmbrs
Apideck Blog

Insights, guides, and updates from Apideck

Discover company news, API insights, and expert blog posts. Explore practical integration guides and tech articles to make the most of Apideck's platform.