Documentation
Memory
Semantic memory powered by pgvector. Pass `memory: { namespace }` and the agent recalls past turns automatically — no embedding work in your code.
Quickstart
const agent = createAgent({
model: "gpt-4o-mini",
memory: { namespace: `user:${userId}` },
});
await agent.run("I'm Kevin. I drink only espresso. Remember this.");
// Hours, days, processes later — same namespace:
for await (const e of agent.run("What's my coffee?")) {
// → "Espresso, Kevin."
}What happens under the hood
On every run with memory set, the control plane:
- Embeds the user input via OpenAI
text-embedding-3-small(1536-dim). - Searches top-5 similar memories by cosine similarity within
(tenant_id, namespace), with a similarity floor of 0.3. - Injects them into the system prompt as a bullet list (“Relevant context from past interactions”).
- Persists a
memory_retrievedevent atseq=0so the dashboard trace shows what was recalled. - Runs the agent normally.
- After
done, embeds the(input, output)pair and stores it as a new memory linked to the sourcerun_id.
Namespaces
A namespace scopes a chunk of memory. Use them like database tables — by user, by session, by agent persona, by team.
memory: true // → namespace "default"
memory: { namespace: "default" } // explicit same thing
memory: { namespace: `user:${userId}` } // per-user
memory: { namespace: `thread:${threadId}` } // per-conversation
memory: { namespace: `agent:support` } // per-personaWhy OpenAI is required
Memory always uses OpenAI text-embedding-3-small for embeddings — even when the chat model is Claude. Anthropic doesn't ship an embeddings API. Make sure your tenant has an OpenAI credential uploaded, even if you only chat with Claude.
curl -X PUT $RELAY_URL/v1/credentials/openai \
-H "authorization: Bearer $RELAY_API_KEY" \
-H "content-type: application/json" \
-d '{"apiKey":"sk-..."}'Inspect & manage
Three HTTP endpoints let you peek into memory state.
List memories
curl -s -H "authorization: Bearer $RELAY_API_KEY" \
"localhost:4000/v1/memories?namespace=user:42&limit=20" | jqDelete one memory
curl -X DELETE -H "authorization: Bearer $RELAY_API_KEY" \
"localhost:4000/v1/memories/<memory-id>"Clear a whole namespace
curl -X DELETE -H "authorization: Bearer $RELAY_API_KEY" \
"localhost:4000/v1/memories?namespace=user:42"Schema (advanced)
create table memories (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null references tenants(id) on delete cascade,
namespace text not null,
content text not null,
embedding vector(1536) not null,
metadata jsonb not null default '{}'::jsonb,
source_run_id uuid references runs(id) on delete set null,
created_at timestamptz not null default now(),
ttl_at timestamptz -- expired rows filtered at read time
);
create index memories_embedding_idx
on memories using ivfflat (embedding vector_cosine_ops) with (lists = 100);Costs
Memory adds two embedding calls per run (one to query, one to store). With text-embedding-3-small at $0.02 per 1M tokens, this is effectively rounding error vs the chat completion itself.