Skip to main content
Version: 2.0

Context & memory

LLMs answer best when the right context is in their window at the right moment. Most platforms split that need across three different stores: a vector DB for documents, a key-value store for session state, and an opaque chat history for memory. Vectara unifies it. The same primitive that powers your RAG also powers per-user memory and scratchpads for tool caches.

This page is the conceptual framing — why the corpus primitive serves all three roles. For the practical patterns (in-session memory via artifacts, cross-session memory via a corpus, when to use each), the canonical guide is Memory.

The corpus primitive

A Vectara corpus is more than a vector store. It is a writable, filterable, multilingual, semantic index with hybrid retrieval, chain rerankers, citations, and RBAC built in. Filter attributes (user_id, session_id, topic, indexed_at) turn it into structured storage. Multi-corpus search at query time means one call can blend knowledge + memory + cache. RBAC enforcement at retrieval means a user only ever sees the chunks their identity is entitled to.

Read and write are both first-class agent tools:

ToolPurpose
corpora_searchHybrid retrieval over one or more corpora.
core_document_indexWrite a new document. The unit of memory or cache.
metadata_queryFind documents by filter attribute without full-text search.
list_documentsEnumerate documents (browse mode for memories).
get_document_textRead one document by ID.
corpus_filter_attribute_statsAggregate the filter values present.
corpora_listList corpora visible to the caller.

Wire any of them onto an agent and the LLM decides which to call.

Three roles, one primitive

The same corpus engine — same retrieval, same RBAC, same citations — serves three different jobs an agent needs.

RoleWhat lives in itWhere to read more
KnowledgeManuals, KB articles, support tickets, datasheets, policies. The classic RAG corpus.Knowledge
MemoryWhat the agent learns about a user: preferences, prior decisions, conversation summaries — indexed per user, per session, per tenant.Memory
ScratchpadCached tool results, derived facts, past-decision logs — written back so future agents find them by meaning, not by ID.This page, below

Why memory in a corpus

Vectara supports two memory patterns and they compose:

  • In-session memory via artifacts — for plans, scratch state, and step-to-step handoffs within a single session. Cheap, always consistent.
  • Cross-session memory via a corpus — for anything that persists across sessions or is shared between agents or users. Indexed, searchable, governed by the same RBAC as your knowledge corpora.

The canonical guide for both patterns, including TTL, forgetting, and when to reach for each, is Memory. What follows is the conceptual case for why memory in a corpus is the right choice for cross-session state.

The "full memory toolkit" wiring

When the agent needs more than semantic recall — browse all my memories, find memories tagged X, pull memory by ID — attach the full read+write toolkit, not just corpora_search. The chain reranker boosts recent memories so newer facts win on ties without losing the older context.

"recall": {
"type": "corpora_search",
"query_configuration": {
"search": {
"corpora": [{
"corpus_key": "user-prefs",
"metadata_filter": "doc.user_id = '$session.metadata.user_id$'"
}],
"reranker": {
"type": "chain",
"rerankers": [
{ "type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"limit": 20 },
{ "type": "userfn",
"user_function": "get('$.score') * (1 + 0.0000001 * get('$.doc.indexed_at'))" }
]
}
}
}
}

Note how metadata_filter reads from $session.metadata.user_id$ — the same agent serves every user because the filter scopes the search to the current session's identity. No per-tenant agent fork, no glue code.

Scratchpad — caching tool results and derived facts

Tools cost money and time. Many tools — Octopart price lookups, Salesforce account fetches, weather APIs — return values that change slowly. Writing them back to a corpus turns the corpus into a semantic cache: future agents (or the same agent in a later turn) find the result by meaning, not by ID. A semantic match beats a key-value store for fuzzy lookups.

"corpus_key": "parts-cache",
"document": {
"text": "5CGXFC9E7F35I7N — Cyclone V FPGA, $387, in stock, 1.2 GHz",
"metadata": { "part_id": "5CGXFC9E7F35I7N", "fetched_at": "2026-05-22" }
}

A later turn can ask: "any 1+ GHz FPGAs under $400?" and the cached entry matches semantically, even though those exact words never appear in the document text.

The most powerful scratchpad pattern is the past-decision log: every approval, classification, or routing decision becomes a precedent the agent can retrieve next time a similar request lands. The agent does not need to be retrained — the corpus is its training signal.

Multi-tenant by construction

The same agent definition can serve every tenant in your application. You do not fork agents per customer.

  • session.metadata.user_id scopes memory recall to one user.
  • session.metadata.corpus_key swaps which corpus the agent reads from.
  • session.metadata.locale switches between Boomerang multilingual variants.
  • session.metadata.tier toggles which skills are enabled.

Pass these at session creation. The agent's $ref resolves them at call time. Done.

See Multi-client steering for the patterns.

Why this beats a separate vector store + key-value store

CapabilityMemory in a corpusSeparate vector store + KV
Semantic recallYesYes (vector store)
Browse by IDYesYes (KV)
Filter by tagYesTwo stores to keep in sync
RBAC at retrievalYes — same engine as document RAGTwo RBAC systems to wire up
MultilingualYes — BoomerangDepends on the vector store
RerankingYes — chain rerankers including recency UDFRoll your own
Citations on every recallYesYes (vector store)
Multi-source search in one callYes — multi-corpus queryMultiple round trips
Audit trailYes — every read and write is a logged eventTwo systems to audit

One primitive, three jobs — no separate vector DB, no key-value store, no session-state service. That collapse is the unlock product-first platforms cannot ship.

  • Memory — the canonical guide to in-session and cross-session memory patterns, including the artifact-first flow and forgetting strategies.
  • Knowledge — the same primitive serving the classic RAG use case.
  • Artifacts — the in-session scratchpad primitive that pairs with corpus-backed memory.
  • Metadata filters — declaring filter attributes and writing filter expressions.
  • Multi-client steering — one agent definition, every tenant.
  • Reranking — chain rerankers, recency-aware UDFs.