relay

Documentation

Memory

Semantic memory powered by pgvector. Pass `memory: { namespace }` and the agent recalls past turns automatically — no embedding work in your code.

Quickstart

const agent = createAgent({
  model: "gpt-4o-mini",
  memory: { namespace: `user:${userId}` },
});

await agent.run("I'm Kevin. I drink only espresso. Remember this.");

// Hours, days, processes later — same namespace:
for await (const e of agent.run("What's my coffee?")) {
  // → "Espresso, Kevin."
}

What happens under the hood

On every run with memory set, the control plane:

  1. Embeds the user input via OpenAI text-embedding-3-small (1536-dim).
  2. Searches top-5 similar memories by cosine similarity within (tenant_id, namespace), with a similarity floor of 0.3.
  3. Injects them into the system prompt as a bullet list (“Relevant context from past interactions”).
  4. Persists a memory_retrieved event at seq=0 so the dashboard trace shows what was recalled.
  5. Runs the agent normally.
  6. After done, embeds the (input, output) pair and stores it as a new memory linked to the source run_id.

Namespaces

A namespace scopes a chunk of memory. Use them like database tables — by user, by session, by agent persona, by team.

memory: true                              // → namespace "default"
memory: { namespace: "default" }          // explicit same thing
memory: { namespace: `user:${userId}` }   // per-user
memory: { namespace: `thread:${threadId}` } // per-conversation
memory: { namespace: `agent:support` }    // per-persona

Why OpenAI is required

Memory always uses OpenAI text-embedding-3-small for embeddings — even when the chat model is Claude. Anthropic doesn't ship an embeddings API. Make sure your tenant has an OpenAI credential uploaded, even if you only chat with Claude.

curl -X PUT $RELAY_URL/v1/credentials/openai \
  -H "authorization: Bearer $RELAY_API_KEY" \
  -H "content-type: application/json" \
  -d '{"apiKey":"sk-..."}'

Inspect & manage

Three HTTP endpoints let you peek into memory state.

List memories

curl -s -H "authorization: Bearer $RELAY_API_KEY" \
  "localhost:4000/v1/memories?namespace=user:42&limit=20" | jq

Delete one memory

curl -X DELETE -H "authorization: Bearer $RELAY_API_KEY" \
  "localhost:4000/v1/memories/<memory-id>"

Clear a whole namespace

curl -X DELETE -H "authorization: Bearer $RELAY_API_KEY" \
  "localhost:4000/v1/memories?namespace=user:42"

Schema (advanced)

create table memories (
  id            uuid primary key default gen_random_uuid(),
  tenant_id     uuid not null references tenants(id) on delete cascade,
  namespace     text not null,
  content       text not null,
  embedding     vector(1536) not null,
  metadata      jsonb not null default '{}'::jsonb,
  source_run_id uuid references runs(id) on delete set null,
  created_at    timestamptz not null default now(),
  ttl_at        timestamptz                            -- expired rows filtered at read time
);

create index memories_embedding_idx
  on memories using ivfflat (embedding vector_cosine_ops) with (lists = 100);

Costs

Memory adds two embedding calls per run (one to query, one to store). With text-embedding-3-small at $0.02 per 1M tokens, this is effectively rounding error vs the chat completion itself.