relay

Documentation

Self-host

Production deployment: env variables, security posture, scaling notes, backups. The same code runs in dev and prod.

Environment variables

Generated by pnpm bootstrap for dev. In production, set them in your secret manager and pass to each service.

VariableWherePurpose
DATABASE_URLcontrol-planePostgres with pgvector extension. e.g. postgres://user:pass@host:5432/relay.
RELAY_MASTER_KEYcontrol-plane32-byte hex. Encrypts provider credentials. Don't rotate without re-encrypting.
RELAY_INTERNAL_SECRETbothShared secret for the runtime → control-plane callback. Set on both services. If unset, the callback is unauthenticated (dev only).
RUNTIME_URLcontrol-planeWhere the runtime listens. Default http://localhost:4100.
CONTROL_PLANE_URLruntimeWhere the control-plane listens. Default http://localhost:4000.
PORTcontrol-planeHTTP port. Default 4000.
RELAY_TOOL_RESULT_TIMEOUT_MScontrol-planeCustom tool long-poll timeout. Default 30000.

Postgres setup

You need PostgreSQL 16+ with pgvector. Managed options that include both:

  • Supabase (pgvector built in)
  • Neon (enable the vector extension)
  • RDS for PostgreSQL with the vector extension
  • Fly.io Postgres (with manual create extension vector)

Apply migrations from a checkout of the repo:

DATABASE_URL=postgres://… pnpm --filter @relayhq/db migrate

Running the services

Three processes. Any container orchestrator (Fly, Render, Railway, Kubernetes, Docker Swarm, plain systemd) works.

Runtime (Go)

# build a static binary
cd runtime
CGO_ENABLED=0 go build -o relay-runtime ./cmd/runtime

# run it
CONTROL_PLANE_URL=https://api.relay.your-domain.com \
  RELAY_INTERNAL_SECRET=… \
  PORT=4100 \
  ./relay-runtime

Control plane (Node)

pnpm --filter @relayhq/control-plane build
DATABASE_URL= \
  RELAY_MASTER_KEY=… \
  RELAY_INTERNAL_SECRET=… \
  RUNTIME_URL=http://runtime.internal:4100 \
  PORT=4000 \
  node packages/control-plane/dist/server.js

Dashboard (Next.js)

Deploy as a normal Next app (Vercel, Fly, Render, or just next start). Set RELAY_URL and RELAY_API_KEY on its env.

pnpm --filter @relayhq/dashboard build
RELAY_URL=https://api.relay.your-domain.com \
  RELAY_API_KEY=relay_live_… \
  pnpm --filter @relayhq/dashboard start

Bootstrapping a tenant in production

The bootstrap script works the same way against a remote Postgres — just point DATABASE_URL at it.

DATABASE_URL=postgres://prod-host… \
  RELAY_MASTER_KEY=$RELAY_MASTER_KEY \
  ANTHROPIC_API_KEY=sk-ant-… \
  OPENAI_API_KEY=sk-… \
  pnpm bootstrap

Security posture

  • Master key: derive from your secret manager (AWS KMS, GCP Secret Manager, 1Password, …). Never put it in source control or container env files.
  • Internal secret: required in production. The control plane warns at boot when it's missing.
  • TLS: terminate at your load balancer (Fly Proxy, ELB, Cloudflare). Nothing in Relay does TLS itself.
  • Network boundaries: only the control plane needs public ingress. Runtime + Postgres stay private.
  • Tenant isolation: every read query filters by tenant_id. There's no client-supplied tenant id anywhere in the API.
  • The runtime never sees a Relay key and has no DB access. A compromised runtime instance leaks one in-flight run, not your tenant catalog.

Scaling notes

The runtime is stateless — run as many replicas as you want behind a load balancer. The control plane is also horizontally scalable, but the pending-tools broker is in-memory: sticky sessions OR pin a single instance per (run_id) until the run completes. (A Redis-backed broker is on the roadmap.)

Backups

Standard Postgres backups. pg_dump nightly plus point-in-time recovery if your hosting offers it. The run_events table grows fastest (one row per token); rotate/archive on a schedule if needed.

Upgrades

New migrations are forward-compatible by convention. Standard flow:

  1. Pull new code on every service.
  2. pnpm --filter @relayhq/db migrate against prod DB.
  3. Roll runtime, then control plane, then dashboard.