Documentation
Architecture
Three services and one Postgres. Each piece has one job. The seams are deliberate.
The whole stack
Control plane (TS / Hono)
The only stateful piece. Owns Postgres, owns the master key, owns all auth. Every public-facing concern lives here.
- Auth: reads
relay_live_…bearer, looks up the tenant via SHA-256 of the full key. - BYOK: fetches the tenant's encrypted provider credential, decrypts with AES-256-GCM, passes plaintext to the runtime per request.
- Persistence: creates the
runsrow, tees every SSE event from the runtime intorun_events, marks complete/failed ondone/error. - Memory pipeline: embeds the input, fetches top-K from
memories, injects into the system prompt before forwarding to the runtime. Stores the result post-done. - Custom tools broker: in-memory map of pending tool results. The runtime long-polls; the SDK posts; the broker matches and unblocks.
Runtime (Go)
Stateless. No database. No persistent provider keys. Receives one request per run, executes the agent loop, streams events back.
- Provider abstraction: one normalized
Message/ContentPart/StreamEventshape; each provider translates to/from its wire format. - Router: picks Anthropic vs OpenAI by model prefix.
- Built-in tools: small registry executed in-process (e.g. calculator).
- Custom tools callback: when the LLM fires a function tool, the runtime long-polls the control plane and blocks until the SDK posts a result.
- Max iterations: 8 per run, to prevent infinite tool loops.
Postgres + pgvector
The single source of truth. pgvector is the only extension (used for memory). Schema:
tenants who owns what
└─ api_keys relay_live_… (sha-256 hashed)
└─ provider_credentials per-provider LLM keys (AES-256-GCM at rest)
└─ runs each execution, scoped to a tenant
└─ run_events ordered event log per run
└─ memories pgvector(1536), namespacedWhy this shape
Why is the runtime stateless?
So that the heavy work (LLM streaming, tool dispatch) can scale horizontally without database contention. The runtime is a pure worker — kill any instance, spin up another, no migration needed. A future managed cloud puts a fleet behind a load balancer; nothing in the agent loop has to change.
Why is the broker in the control plane?
Custom tools need a rendezvous point. The SDK and the runtime can't talk directly (the SDK is on the public internet, the runtime is internal). The control plane is already on the public path and already authenticates the SDK — adding a tiny broker is the smallest possible change.
Why BYOK?
Zero billing risk for users (their tokens go straight to providers), zero cash-flow risk for us, no margin negotiation with LLM vendors, immediate trust signal. See Providers for the credential lifecycle.