Version: 2.0

Context and memory

LLMs answer best when the right context is in their window at the right moment. Most platforms split that need across three different stores: a vector DB for documents, a key-value store for session state, and an opaque chat history for memory. Vectara unifies it. The same primitive that powers your RAG also powers per-user memory and scratchpads for tool caches.

This page is the conceptual framing: why the corpus primitive serves all three roles. For the practical patterns (in-session memory via artifacts, cross-session memory via a corpus, when to use each), the canonical guide is Memory.

The corpus primitive

A Vectara corpus is more than a vector store. It is a writable, filterable, multilingual, semantic index with hybrid retrieval, chain rerankers, citations, and RBAC built in. Filter attributes (user_id, session_id, topic, indexed_at) turn it into structured storage. Multi-corpus search at query time means one call can blend knowledge + memory + cache. RBAC enforcement at retrieval means a user only ever sees the chunks their identity is entitled to.

Read and write are both first-class agent tools:

Tool	Purpose
`corpora_search`	Hybrid retrieval over one or more corpora.
`core_document_index`	Write a new document. The unit of memory or cache.
`metadata_query`	Find documents by filter attribute without full-text search.
`list_documents`	Enumerate documents (browse mode for memories).
`get_document_text`	Read one document by ID.
`corpus_filter_attribute_stats`	Aggregate the filter values present.
`corpora_list`	List corpora visible to the caller.

Wire any of them onto an agent and the LLM decides which to call.

Three roles, one primitive

The same corpus engine (same retrieval, same RBAC, same citations) serves three different jobs an agent needs.

Role	What lives in it	Where to read more
Knowledge	Manuals, KB articles, support tickets, datasheets, policies. The classic RAG corpus.	Knowledge
Memory	What the agent learns about a user: preferences, prior decisions, conversation summaries, indexed per user, per session, per tenant.	Memory
Scratchpad	Cached tool results, derived facts, past-decision logs, written back so future agents find them by meaning, not by ID.	This page, below

Why memory in a corpus

Vectara supports two memory patterns and they compose:

In-session memory via artifacts: for plans, scratch state, and step-to-step handoffs within a single session. Cheap, always consistent.
Cross-session memory via a corpus: for anything that persists across sessions or is shared between agents or users. Indexed, searchable, governed by the same RBAC as your knowledge corpora.

The canonical guide for both patterns, including TTL, forgetting, and when to reach for each, is Memory. What follows is the conceptual case for why memory in a corpus is the right choice for cross-session state.

The "full memory toolkit" wiring

When the agent needs more than semantic recall (browse all my memories, find memories tagged X, pull memory by ID), attach the full read+write toolkit, not just corpora_search. The chain reranker boosts recent memories so newer facts win on ties without losing the older context.

"recall": {
  "type": "corpora_search",
  "query_configuration": {
    "search": {
      "corpora": [{
        "corpus_key": "user-prefs",
        "metadata_filter": "doc.user_id = '$session.metadata.user_id$'"
      }],
      "reranker": {
        "type": "chain",
        "rerankers": [
          { "type": "customer_reranker",
            "reranker_name": "Rerank_Multilingual_v1",
            "limit": 20 },
          { "type": "userfn",
            "user_function": "get('$.score') * (1 + 0.0000001 * get('$.doc.indexed_at'))" }
        ]
      }
    }
  }
}

Note how metadata_filter reads from $session.metadata.user_id$ : the same agent serves every user because the filter scopes the search to the current session's identity. No per-tenant agent fork, no glue code.

Scratchpad: caching tool results and derived facts

Tools cost money and time. Many tools (Octopart price lookups, Salesforce account fetches, weather APIs) return values that change slowly. Writing them back to a corpus turns the corpus into a semantic cache: future agents (or the same agent in a later turn) find the result by meaning, not by ID. A semantic match beats a key-value store for fuzzy lookups.

"corpus_key": "parts-cache",
"document": {
  "text": "5CGXFC9E7F35I7N — Cyclone V FPGA, $387, in stock, 1.2 GHz",
  "metadata": { "part_id": "5CGXFC9E7F35I7N", "fetched_at": "2026-05-22" }
}

A later turn can ask: "any 1+ GHz FPGAs under $400?" and the cached entry matches semantically, even though those exact words never appear in the document text.

The highest-leverage scratchpad pattern is the past-decision log: every approval, classification, or routing decision becomes a precedent the agent can retrieve next time a similar request lands. The agent does not need to be retrained. It can retrieve prior decisions from the corpus at runtime and use them as context for similar future requests.

Multi-tenant by construction

The same agent definition can serve every tenant in your application. You do not fork agents per customer.

session.metadata.user_id scopes memory recall to one user.
session.metadata.corpus_key swaps which corpus the agent reads from.
session.metadata.locale switches between Boomerang multilingual variants.
session.metadata.tier toggles which skills are enabled.

Pass these at session creation. The agent's $ref resolves them at call time. Done.

See Multi-client steering for the patterns.

Why this beats a separate vector store + key-value store

Capability	Memory in a corpus	Separate vector store + KV
Semantic recall	Yes	Yes (vector store)
Browse by ID	Yes	Yes (KV)
Filter by tag	Yes	Two stores to keep in sync
RBAC at retrieval	Yes, same engine as document RAG	Two RBAC systems to wire up
Multilingual	Yes, Boomerang	Depends on the vector store
Reranking	Yes, chain rerankers including recency UDF	Roll your own
Citations on every recall	Yes	Yes (vector store)
Multi-source search in one call	Yes, multi-corpus query	Multiple round trips
Audit trail	Yes, every read and write is a logged event	Two systems to audit

One primitive, three jobs: no separate vector DB, no key-value store, no session-state service. That collapse is the unlock product-first platforms cannot ship.

Memory — the canonical guide to in-session and cross-session memory patterns, including the artifact-first flow and forgetting strategies.
Knowledge — the same primitive serving the classic RAG use case.
Artifacts — the in-session scratchpad primitive that pairs with corpus-backed memory.
Metadata filters — declaring filter attributes and writing filter expressions.
Multi-client steering — one agent definition, every tenant.
Reranking — chain rerankers, recency-aware UDFs.

The corpus primitive​

Three roles, one primitive​

Why memory in a corpus​

The "full memory toolkit" wiring​

Scratchpad: caching tool results and derived facts​

Multi-tenant by construction​

Why this beats a separate vector store + key-value store​

Related​