Context & memory
LLMs answer best when the right context is in their window at the right moment. Most platforms split that need across three different stores: a vector DB for documents, a key-value store for session state, and an opaque chat history for memory. Vectara unifies it. The same primitive that powers your RAG also powers per-user memory and scratchpads for tool caches.
This page is the conceptual framing — why the corpus primitive serves all three roles. For the practical patterns (in-session memory via artifacts, cross-session memory via a corpus, when to use each), the canonical guide is Memory.
The corpus primitive
A Vectara corpus is more than a vector store. It is a writable,
filterable, multilingual, semantic index with hybrid retrieval, chain
rerankers, citations, and RBAC built in. Filter attributes
(user_id, session_id, topic, indexed_at) turn it into structured
storage. Multi-corpus search at query time means one call can blend
knowledge + memory + cache. RBAC enforcement at retrieval means a user
only ever sees the chunks their identity is entitled to.
Read and write are both first-class agent tools:
| Tool | Purpose |
|---|---|
corpora_search | Hybrid retrieval over one or more corpora. |
core_document_index | Write a new document. The unit of memory or cache. |
metadata_query | Find documents by filter attribute without full-text search. |
list_documents | Enumerate documents (browse mode for memories). |
get_document_text | Read one document by ID. |
corpus_filter_attribute_stats | Aggregate the filter values present. |
corpora_list | List corpora visible to the caller. |
Wire any of them onto an agent and the LLM decides which to call.
Three roles, one primitive
The same corpus engine — same retrieval, same RBAC, same citations — serves three different jobs an agent needs.
| Role | What lives in it | Where to read more |
|---|---|---|
| Knowledge | Manuals, KB articles, support tickets, datasheets, policies. The classic RAG corpus. | Knowledge |
| Memory | What the agent learns about a user: preferences, prior decisions, conversation summaries — indexed per user, per session, per tenant. | Memory |
| Scratchpad | Cached tool results, derived facts, past-decision logs — written back so future agents find them by meaning, not by ID. | This page, below |
Why memory in a corpus
Vectara supports two memory patterns and they compose:
- In-session memory via artifacts — for plans, scratch state, and step-to-step handoffs within a single session. Cheap, always consistent.
- Cross-session memory via a corpus — for anything that persists across sessions or is shared between agents or users. Indexed, searchable, governed by the same RBAC as your knowledge corpora.
The canonical guide for both patterns, including TTL, forgetting, and when to reach for each, is Memory. What follows is the conceptual case for why memory in a corpus is the right choice for cross-session state.
The "full memory toolkit" wiring
When the agent needs more than semantic recall — browse all my
memories, find memories tagged X, pull memory by ID — attach the full
read+write toolkit, not just corpora_search. The chain reranker
boosts recent memories so newer facts win on ties without losing the
older context.
"recall": {
"type": "corpora_search",
"query_configuration": {
"search": {
"corpora": [{
"corpus_key": "user-prefs",
"metadata_filter": "doc.user_id = '$session.metadata.user_id$'"
}],
"reranker": {
"type": "chain",
"rerankers": [
{ "type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"limit": 20 },
{ "type": "userfn",
"user_function": "get('$.score') * (1 + 0.0000001 * get('$.doc.indexed_at'))" }
]
}
}
}
}
Note how metadata_filter reads from $session.metadata.user_id$ —
the same agent serves every user because the filter scopes the
search to the current session's identity. No per-tenant agent fork,
no glue code.
Scratchpad — caching tool results and derived facts
Tools cost money and time. Many tools — Octopart price lookups, Salesforce account fetches, weather APIs — return values that change slowly. Writing them back to a corpus turns the corpus into a semantic cache: future agents (or the same agent in a later turn) find the result by meaning, not by ID. A semantic match beats a key-value store for fuzzy lookups.
"corpus_key": "parts-cache",
"document": {
"text": "5CGXFC9E7F35I7N — Cyclone V FPGA, $387, in stock, 1.2 GHz",
"metadata": { "part_id": "5CGXFC9E7F35I7N", "fetched_at": "2026-05-22" }
}
A later turn can ask: "any 1+ GHz FPGAs under $400?" and the cached entry matches semantically, even though those exact words never appear in the document text.
The most powerful scratchpad pattern is the past-decision log: every approval, classification, or routing decision becomes a precedent the agent can retrieve next time a similar request lands. The agent does not need to be retrained — the corpus is its training signal.
Multi-tenant by construction
The same agent definition can serve every tenant in your application. You do not fork agents per customer.
session.metadata.user_idscopes memory recall to one user.session.metadata.corpus_keyswaps which corpus the agent reads from.session.metadata.localeswitches between Boomerang multilingual variants.session.metadata.tiertoggles which skills are enabled.
Pass these at session creation. The agent's $ref resolves them at
call time. Done.
See Multi-client steering for the patterns.
Why this beats a separate vector store + key-value store
| Capability | Memory in a corpus | Separate vector store + KV |
|---|---|---|
| Semantic recall | Yes | Yes (vector store) |
| Browse by ID | Yes | Yes (KV) |
| Filter by tag | Yes | Two stores to keep in sync |
| RBAC at retrieval | Yes — same engine as document RAG | Two RBAC systems to wire up |
| Multilingual | Yes — Boomerang | Depends on the vector store |
| Reranking | Yes — chain rerankers including recency UDF | Roll your own |
| Citations on every recall | Yes | Yes (vector store) |
| Multi-source search in one call | Yes — multi-corpus query | Multiple round trips |
| Audit trail | Yes — every read and write is a logged event | Two systems to audit |
One primitive, three jobs — no separate vector DB, no key-value store, no session-state service. That collapse is the unlock product-first platforms cannot ship.
Related
- Memory — the canonical guide to in-session and cross-session memory patterns, including the artifact-first flow and forgetting strategies.
- Knowledge — the same primitive serving the classic RAG use case.
- Artifacts — the in-session scratchpad primitive that pairs with corpus-backed memory.
- Metadata filters — declaring filter attributes and writing filter expressions.
- Multi-client steering — one agent definition, every tenant.
- Reranking — chain rerankers, recency-aware UDFs.