Documentation

Architecture

Three services and one Postgres. Each piece has one job. The seams are deliberate.

The whole stack

three services + one Postgres

rendering diagram…

Control plane (TS / Hono)

The only stateful piece. Owns Postgres, owns the master key, owns all auth. Every public-facing concern lives here.

Auth: reads relay_live_… bearer, looks up the tenant via SHA-256 of the full key.
BYOK: fetches the tenant's encrypted provider credential, decrypts with AES-256-GCM, passes plaintext to the runtime per request.
Persistence: creates the runs row, tees every SSE event from the runtime into run_events, marks complete/failed on done/error.
Memory pipeline: embeds the input, fetches top-K from memories, injects into the system prompt before forwarding to the runtime. Stores the result post-done.
Custom tools broker: shared rendezvous for pending tool results. The runtime long-polls; the SDK posts; the broker matches and unblocks. Backed by an in-memory map for single-instance deploys, or by NATS JetStream KV when NATS_URL is set — that's what lets the control plane scale horizontally.
Rate limiting: per-tenant token-bucket middleware on /v1/*. Memory backend by default; Redis backend (atomic via Lua) when REDIS_URL is set.
Run linking: sub-agent and graph workflows share one workflow_id; GET /v1/workflows/:id returns the full tree with aggregated cost. See Workflows.

Runtime (Go)

Stateless. No database. No persistent provider keys. Receives one request per run, executes the agent loop, streams events back.

Provider abstraction: one normalized Message / ContentPart / StreamEvent shape; each provider translates to/from its wire format.
Router: picks Anthropic vs OpenAI by model prefix.
Built-in tools: small registry executed in-process (e.g. calculator).
Custom tools callback: when the LLM fires a function tool, the runtime long-polls the control plane and blocks until the SDK posts a result.
Max iterations: 8 per run, to prevent infinite tool loops.

Postgres + pgvector

The transactional source of truth. Row-Level Security is enabled on every tenant-scoped table; the control plane connects as a non-owner role so RLS actually applies. Schema:

tenants                  who owns what
  └─ api_keys            relay_live_… (sha-256 hashed)
  └─ provider_credentials  per-provider LLM keys (AES-256-GCM at rest)
  └─ runs                each execution, scoped to a tenant
        └─ run_events    ordered event log per run (mirrored to ClickHouse)
  └─ memories            pgvector(1536), namespaced
  └─ audit_events        every security-relevant action

See Security for the full RLS + key rotation + audit model.

Scale add-ons (optional)

The base stack (control-plane + runtime + Postgres) is enough for single-instance deploys. To scale horizontally, three optional services attach via env vars:

NATS JetStream (NATS_URL) — shared KV for the custom- tool broker. Required to run more than one control-plane replica.
Redis (REDIS_URL) — atomic token-bucket for rate limiting across the fleet. Optional, but fleet-wide caps only work with it.
ClickHouse (CLICKHOUSE_URL) — append-only columnar store for run_events. Postgres is fine until you hit ~10k events/sec; past that, switch to double-write and eventually flip READ_EVENTS_FROM=clickhouse.

Why this shape

Why is the runtime stateless?

So that the heavy work (LLM streaming, tool dispatch) can scale horizontally without database contention. The runtime is a pure worker — kill any instance, spin up another, no migration needed. A future managed cloud puts a fleet behind a load balancer; nothing in the agent loop has to change.

Why is the broker in the control plane?

Custom tools need a rendezvous point. The SDK and the runtime can't talk directly (the SDK is on the public internet, the runtime is internal). The control plane is already on the public path and already authenticates the SDK — adding a tiny broker is the smallest possible change.

Why BYOK?

Zero billing risk for users (their tokens go straight to providers), zero cash-flow risk for us, no margin negotiation with LLM vendors, immediate trust signal. See Providers for the credential lifecycle.

A run, end to end

one full run, every actor

rendering diagram…