Documentation

Self-host

Production deployment: env variables, security posture, scaling notes, backups. The same code runs in dev and prod.

Environment variables

Generated by pnpm bootstrap for dev. In production, set them in your secret manager and pass to each service.

Variable	Where	Purpose
DATABASE_URL	control-plane	Postgres with pgvector extension. e.g. `postgres://user:pass@host:5432/relay`.
RELAY_MASTER_KEY	control-plane	32-byte hex. Encrypts provider credentials. Don't rotate without re-encrypting.
RELAY_INTERNAL_SECRET	both	Shared secret for the runtime → control-plane callback. Set on both services. If unset, the callback is unauthenticated (dev only).
RUNTIME_URL	control-plane	Where the runtime listens. Default `http://localhost:4100`.
CONTROL_PLANE_URL	runtime	Where the control-plane listens. Default `http://localhost:4000`.
PORT	control-plane	HTTP port. Default 4000.
RELAY_TOOL_RESULT_TIMEOUT_MS	control-plane	Custom tool long-poll timeout. Default 30000.

Postgres setup

You need PostgreSQL 16+ with pgvector. Managed options that include both:

Supabase (pgvector built in)
Neon (enable the vector extension)
RDS for PostgreSQL with the vector extension
Fly.io Postgres (with manual create extension vector)

Apply migrations from a checkout of the repo:

DATABASE_URL=postgres://… pnpm --filter @relayhq/db migrate

Running the services

Three processes. Any container orchestrator (Fly, Render, Railway, Kubernetes, Docker Swarm, plain systemd) works.

Runtime (Go)

# build a static binary
cd runtime
CGO_ENABLED=0 go build -o relay-runtime ./cmd/runtime

# run it
CONTROL_PLANE_URL=https://api.relay.your-domain.com \
  RELAY_INTERNAL_SECRET=… \
  PORT=4100 \
  ./relay-runtime

Control plane (Node)

pnpm --filter @relayhq/control-plane build
DATABASE_URL=… \
  RELAY_MASTER_KEY=… \
  RELAY_INTERNAL_SECRET=… \
  RUNTIME_URL=http://runtime.internal:4100 \
  PORT=4000 \
  node packages/control-plane/dist/server.js

Dashboard (Next.js)

Deploy as a normal Next app (Vercel, Fly, Render, or just next start). Set RELAY_URL and RELAY_API_KEY on its env.

pnpm --filter @relayhq/dashboard build
RELAY_URL=https://api.relay.your-domain.com \
  RELAY_API_KEY=relay_live_… \
  pnpm --filter @relayhq/dashboard start

Bootstrapping a tenant in production

The bootstrap script works the same way against a remote Postgres — just point DATABASE_URL at it.

DATABASE_URL=postgres://prod-host… \
  RELAY_MASTER_KEY=$RELAY_MASTER_KEY \
  ANTHROPIC_API_KEY=sk-ant-… \
  OPENAI_API_KEY=sk-… \
  pnpm bootstrap

Security posture

Master key: derive from your secret manager (AWS KMS, GCP Secret Manager, 1Password, …). Never put it in source control or container env files.
Internal secret: required in production. The control plane warns at boot when it's missing.
TLS: terminate at your load balancer (Fly Proxy, ELB, Cloudflare). Nothing in Relay does TLS itself.
Network boundaries: only the control plane needs public ingress. Runtime + Postgres stay private.
Tenant isolation: every read query filters by tenant_id. There's no client-supplied tenant id anywhere in the API.
The runtime never sees a Relay key and has no DB access. A compromised runtime instance leaks one in-flight run, not your tenant catalog.

Scaling notes

The runtime is stateless — run as many replicas as you want behind a load balancer. The control plane is also horizontally scalable, but the pending-tools broker is in-memory: sticky sessions OR pin a single instance per (run_id) until the run completes. (A Redis-backed broker is on the roadmap.)

Backups

Standard Postgres backups. pg_dump nightly plus point-in-time recovery if your hosting offers it. The run_events table grows fastest (one row per token); rotate/archive on a schedule if needed.

Upgrades

New migrations are forward-compatible by convention. Standard flow:

Pull new code on every service.
pnpm --filter @relayhq/db migrate against prod DB.
Roll runtime, then control plane, then dashboard.