Equinox MCP Generator

Equinox MCP Generator

Project Overview

  • Name: Equinox MCP Generator
  • Type: Meta-tool / code generator (Python CLI + reusable runtime library)
  • Context: Built for Infosys Equinox (closed-source, IP-protected)
  • Purpose: Compile any OpenAPI 3.x specification into a production-ready Model Context Protocol (MCP) server, so existing REST APIs become directly usable by AI agents and MCP-compatible clients with no hand-written glue code.

The generator turns a single OpenAPI spec (file or URL) into a complete, deployable MCP server: tool definitions, auth handling, request-context scaffolding, tests, a Dockerfile, and Kubernetes manifests. It is built on top of FastMCP, and the generated servers speak STDIO, SSE, and streamable HTTP transports so they work with desktop AI clients, agent frameworks, and HTTP-based orchestration alike.

My Role

I designed and built this end to end as the sole author and founder. There were no co-authors. Every layer is mine: the CLI and config model, the OpenAPI parsing/filtering/sanitization pipeline, the Jinja-based code generation, the entire reusable runtime package (auth, middleware, token verification, schema shaping), the gate-pattern design, and the deployment artifacts (Docker, Kubernetes, autoscaling, monitoring). I also owned the v2 re-architecture that moved server behavior out of a giant template and into a tested runtime library.

Key Capabilities

OpenAPI to MCP compilation

  • Automatic tool generation: Every OpenAPI operation becomes an MCP tool with proper typing derived from the spec schemas.
  • Spec loading and sanitization: Accepts JSON or YAML, local file or remote URL, and normalizes spec quirks that strict MCP parsers reject.
  • Filtering and customization: Include/exclude operations by operationId or path pattern (glob-style), include/exclude by OpenAPI tags, and configurable tool-name sanitization for MCP naming compatibility.

Authentication (two distinct layers)

  • Upstream API auth: Resolves OpenAPI security schemes (Bearer, Basic, API key in header/query/cookie, OAuth2 client-credentials, OIDC) from environment variables and bakes them into the outbound HTTP client. The LLM never sees credentials. Per-request auth is carried through an asyncio ContextVar so concurrent requests on the same process never cross-contaminate headers.
  • MCP-transport auth: Pluggable token verifiers supporting opaque-token introspection (RFC 7662), opaque-to-JWT token exchange (RFC 8693), and a passthrough mode for deployments behind an authenticating proxy. The verifier includes in-flight request de-duplication, a positive cache with expiry safety margins, and a short negative cache to protect a slow auth backend from retry storms during the MCP initialize handshake.

Gate pattern

A design I created so LLMs construct correct request bodies for write operations. Tools that take a request body require a context-loading call first; a companion context tool returns available scenarios and the expected parameter structure. The gate is enforced both in the tool schema and server-side in middleware, so it holds even if a client strips the gating parameter.

Production hardening

  • Tuned HTTP connection pooling and keepalive expiry to survive idle-socket teardown across NAT/load-balancer/service-mesh hops.
  • Connect-level transport retries that recover from transient TCP/DNS/TLS failures without masking genuine upstream errors.
  • Trailing-slash route aliasing to prevent redirect-induced loss of the Authorization header in MCP clients.
  • Stateless HTTP by default for horizontal scaling without sticky sessions.
  • Sensitive-value masking throughout logging.

Generated output

Each run emits a self-contained server: the spec, scenario data, an .env from the service config, a pyproject.toml, a generated README, and an optional smoke-test suite. The server itself is intentionally thin and delegates to the shared runtime library.

How It Works (Pipeline)

  1. Config or flags - A service is described either by CLI flags or by a JSON service entry (input spec, base URL, output dir, filters, transport, and an env block written verbatim into the generated server).
  2. Load and validate - The OpenAPI spec is fetched/read, validated, and sanitized.
  3. Filter and sanitize - Operations are narrowed by include/exclude patterns and tags; operationIds are sanitized into MCP-safe tool names, with rename maps propagated so scenario data stays addressable.
  4. Scenario assembly - Optional request-body scenarios are grouped per operation to power the gate-pattern context tool.
  5. Render - Jinja templates render the server, env file, project metadata, README, and tests into the output directory.
  6. Run - At runtime the generated server parses the embedded spec, builds a tuned upstream HTTP client, runs FastMCP over the spec, attaches the middleware stack (session id, upstream auth injection, gate enforcement, response shaping, request logging), wires the chosen token verifier, registers health/discovery routes, and serves the selected transport.

Technical Stack

  • Language: Python 3.11+
  • MCP framework: FastMCP
  • OpenAPI: Prance (parsing/validation), PyYAML, jsonschema
  • HTTP: httpx (async client, custom transport/limits/timeouts)
  • Auth: PyJWT, Authlib, cryptography; OAuth2 / OIDC / token introspection / token exchange
  • Config and validation: Pydantic v2
  • Templating: Jinja2 (StrictUndefined)
  • CLI and output: Click, Rich
  • Cloud / data: boto3 (for optional scenario-data retrieval)
  • Packaging and deployment: setuptools, Docker (multi-stage), Kubernetes (Deployment, Service, HPA autoscaling, monitoring/ServiceMonitor)
  • Quality: pytest (incl. async), ruff, mypy; security-pinned dependencies with CVE-aware version floors

Engineering Highlights

  • A clean separation between a one-time generator and a reusable, independently tested runtime library, so generated servers stay tiny and every behavior is unit-testable in isolation.
  • A two-layer auth model that keeps upstream credentials invisible to the LLM while still validating MCP-transport tokens, including opaque-to-JWT exchange for downstream microservices.
  • The gate pattern: a practical solution to the real problem of LLMs guessing malformed request bodies, enforced defensively at both the schema and middleware layers.
  • Hardening informed by real deployment behavior - connection-pool staleness, redirect header loss, and auth-backend overload were each diagnosed and designed around.

Architecture Diagram

Drag to pan, scroll to zoom