The backend cloud for reliable AI agents.

Memory, retries, tools, traces, and durable execution — without building orchestration infrastructure yourself. Sign up free or self-host in three commands.

Get a free API key→View on GitHub Self-host →

$npm i @relayhq/sdk↗$pip install relayhq↗

agent.ts

import { createAgent, builtin, tool } from "@relayhq/sdk";

const reviewPR = tool({
  name: "review_pr",
  description: "Read a GitHub PR and return a summary",
  inputSchema: { type: "object", properties: { url: { type: "string" } } },
  async handler({ url }) { return github.pulls.get(url); },
});

const agent = createAgent({
  model: "claude-sonnet-4-6",
  memory: { namespace: `user:${userId}` },
  tools: [builtin.calculator, reviewPR],
});

for await (const e of agent.run("Review the last PR")) {
  if (e.type === "token") process.stdout.write(e.text);
}

Built on boring, battle-tested infra

The problem

AI apps fail in production because orchestration is unreliable.

Every team building real agents ends up writing the same plumbing: queues, retries, state machines, tool dispatch, traces, memory, replay. Relay is that plumbing — already built, battle-shaped, and out of your way.

The change

Stop building infrastructure.

Same agent. One SDK call vs nine systems wired by hand.

Without Relay

—OpenAI / Anthropic SDK
—Redis
—Queues
—Retries + backoff
—State management
—Tool orchestration
—Tracing + replay
—Worker pool
—Memory + embeddings

With Relay

const agent = createAgent({
  model: "claude-sonnet-4-6",
  tools: [github, slack],
  memory: { namespace: "user:42" },
})

await agent.run("Review the last PR")

One SDK. Every provider. Memory, tools, traces, retries — built in.

How it works

Three steps from `git clone` to a streaming, traced, memory-aware agent.

Self-host with Docker Compose. The whole stack on your laptop in two minutes.

01step

Clone and bootstrap

git clone, pnpm bootstrap. Idempotent — generates keys, brings up Postgres, applies migrations, mints your API key.

02step

Bring your own keys

Upload Anthropic / OpenAI / OpenAI-compatible credentials once. Encrypted with AES-256-GCM and resolved per request.

03step

Run and observe

Every token, tool call, and memory hit is persisted as an ordered event log. Replay any run in the dashboard.

quickstart.sh

git clone https://github.com/KevinCorrea5103/relay
cd relay && pnpm install
pnpm bootstrap   # mints keys, brings up Postgres, migrates
pnpm dev         # runtime + control-plane + dashboard + web

What you get

Production-grade primitives, not a demo.

Everything an agent needs in production, exposed through a single SDK.

/ Multi-provider routing

Anthropic, OpenAI, and any OpenAI-compatible endpoint (Ollama, vLLM, Groq, Together, OpenRouter). Switch with one string.

/ Multi-agent orchestration

createOrchestrator() builds a supervisor that routes to specialists. subagent() turns any agent into a tool another agent can call.

/ Declarative workflows

The Graph API composes steps, agents, edges, and conditionals — sequential pipelines, parallel fan-out, or stateful loops.

/ Custom function tools

Tools run in your process, not ours. SDK ships the schema, runtime pauses on tool calls, you fulfill them locally over the same stream.

/ Semantic memory

pgvector + automatic indexing. Agents recall past interactions by namespace without you ever touching embeddings.

/ Voice in & out

Transcribe audio with Whisper and synthesize speech with OpenAI TTS — 11 voices, straight from the SDK. Audio streams, never stored.

/ Persistent execution traces

Every run is an ordered event log: tokens, tool calls, memory retrievals, errors. Full replay in the dashboard. Optional ClickHouse mirror for 10k+ events/sec.

/ Streaming + tool calling

Native SSE end to end. Tokens stream as they're generated. Tool calls dispatch in parallel without breaking iteration.

/ BYOK encryption

AES-256-GCM per tenant. The runtime never sees a Relay key and has no database access. Costs flow direct to providers.

/ Per-tenant isolation + audit

Postgres row-level security scopes every read. Runs, memories, credentials are tenant-tagged, and every sensitive action lands in an audit log.

/ Rate limits & key rotation

Per-tenant token-bucket rate limiting (Redis-backed across a fleet), plus zero-downtime API-key and master-key rotation.

/ 100% self-host

Three services and one Postgres via Docker Compose. Scales horizontally with a NATS broker. Run on your laptop or your own cloud — no vendor in the loop.

Compose agents

One agent is a start. Relay is built for teams of them.

Delegate, supervise, or wire explicit graphs — every sub-run is traced under one workflow with cost summed across the team.

subagent(): Wrap any agent as a tool another agent can call and delegate to — with a recursion depth cap so it never runs away.
createOrchestrator(): A supervisor routes each request to the right specialist. The system prompt is auto-generated from your team description.
Graph API: Declarative steps, edges, and conditionals: sequential pipelines, parallel fan-out, or stateful loops with shared state.

Every sub-run shares a workflow_id, so the dashboard renders the full run tree with tokens and cost aggregated across the team.

team.ts

import { createAgent, createOrchestrator } from "@relayhq/sdk";

const researcher = createAgent({
  model: "claude-sonnet-4-6",
  system: "Find and verify facts. Cite sources.",
});

const writer = createAgent({
  model: "gpt-4o",
  system: "Turn notes into crisp, on-brand prose.",
});

// A supervisor that routes to the right teammate — the system
// prompt is auto-generated from the team description.
const team = createOrchestrator({
  model: "claude-sonnet-4-6",
  agents: {
    researcher: { agent: researcher, description: "Finds and verifies facts" },
    writer: { agent: writer, description: "Drafts copy from notes" },
  },
});

for await (const e of team.run("Draft a launch post about our new pricing")) {
  if (e.type === "token") process.stdout.write(e.text);
}

Use it from anywhere

TypeScript today. Python today. Anything else over plain HTTP.

Two official SDKs and a documented HTTP + SSE protocol — wire it into any stack.

npm i @relayhq/sdk ↗

import { createAgent } from "@relayhq/sdk";

const agent = createAgent({
  apiKey: process.env.RELAY_API_KEY!,
  baseUrl: "https://api.relaygh.dev",
  model: "gpt-4o-mini",
});

for await (const event of agent.run("Say hi in three languages.")) {
  if (event.type === "token") process.stdout.write(event.text);
}

No SDK for your language? The protocol is plain HTTP + SSE — ~30 lines in any language. See . /docs/api

Observability

Every run is a complete execution trace.

Tokens, tool calls, results, memory retrievals, errors — captured in order. Replay anything.

Relay execution trace — tool calls and results in order

Built-in dashboard. See exactly what the model did — including the mistakes it self-corrected.

FAQ

Questions

Is there a hosted version?+

Yes — free during beta, no credit card. Sign up with your email and an OpenAI/Anthropic key, you get a `relay_live_…` back, point the SDK at it. The hosted cloud and self-host run the exact same code; you can switch by changing one env var.

Do you take a cut of my tokens?+

No. Relay is BYOK by design — you upload your own Anthropic / OpenAI keys (encrypted with AES-256-GCM the moment they arrive) and your tokens flow straight to the providers. We never proxy billing.

Is this another LangChain wrapper?+

No. Relay is the runtime under your agent, not a chain abstraction. You write plain functions; Relay handles state, retries, providers, tools, memory, and traces.

Where does my data live?+

On the cloud: a per-tenant slice of Postgres on Supabase, credentials encrypted at rest. On self-host: wherever you run your Postgres. Either way, your tool handlers execute in your process — Relay never sees your business logic.

What's the lock-in?+

Almost none. Your tools are TypeScript functions in your repo. Your prompts are strings. Your memory is a Postgres table you can export. The SDK protocol is plain HTTP + SSE. Moving cloud ↔ self-host is one `baseUrl` change.

What can it do today — and what's next?+

Shipping today: streaming agents, multi-provider routing, custom tools, semantic memory, voice (Whisper + TTS), multi-agent orchestration and the Graph API, full execution traces, BYOK encryption, row-level security, audit logs, and rate limiting. Next: durable execution (resume across crashes) and human-in-the-loop checkpoints. Track the roadmap on GitHub.

Get started

Get an API key. Ship an agent today.

Free cloud beta — no credit card. Or self-host the whole stack in three commands.

Get a free API key →Read the quickstart ★ Star