Skip to main content
Version: 2.0

Memory

Most agent turns only see the current session's messages. Real agents often need to recall more — a plan the agent made an hour ago in the same session, a user preference from last week, a decision an operator made six months ago. This page covers the patterns Vectara supports for building that memory.

There are two scopes, and they use different primitives:

  • In-session memory — state the agent carries across turns within a single session. Built on session artifacts.
  • Cross-session memory — facts that persist across sessions or are shared between agents. Built on a Vectara corpus, indexed from artifacts.

The two patterns compose: an agent can use an artifact as a scratchpad during a session and index the final synthesis into a corpus at the end, so future sessions can retrieve it.

In-session memory via artifacts

An agent with artifact_create configured can write its own artifacts during a session and read them back later. That turns the session's artifact store into a persistent scratchpad — memory that survives compaction, survives recency bias, and can be searched with artifact_grep or queried with artifact_jq without reloading the original content into the prompt.

This is the "writing context" pattern from context engineering: rather than cramming everything the agent knows into its prompt, the agent offloads state to disk and reads it back on demand.

Common uses:

  • Running plans and to-do lists. The agent writes a task list at the start of a long session (plan.md) and updates it as work progresses. Subsequent turns grep the plan instead of re-reading the whole transcript.
  • Accumulated notes. During a research session the agent captures facts, citations, or decisions in a notes artifact. The raw transcripts can be compacted away without losing the synthesized summary.
  • Draft outputs in progress. Long-form work — reports, code, briefs — lives in an artifact that gets iteratively edited rather than re-emitted in full on every turn.
  • Intermediate tool results. When a tool returns something the agent wants to refer to later, writing it to an artifact makes it searchable with artifact_grep and survives compaction.
  • Handoffs across steps. A classifier step writes its findings to an artifact; a downstream specialist step reads the artifact instead of parsing the conversation history. See also the plan-then-execute pattern in Steps.

Artifacts live for the full session, so the agent can treat them as persistent memory within a single conversation — even if the surrounding turns get compacted or the session spans many hours.

Cross-session memory via a corpus

Artifacts are scoped to the session that created them. If you want the agent to remember something across sessions — past decisions, user preferences, facts it learned last week — the pattern is: have the agent write the note to an artifact, index the artifact into a corpus, and search that corpus on future turns.

The pieces:

  • artifact_create — the agent writes the note it wants to remember.
  • A document-indexing tool (for example core_document_index) — writes the artifact into a Vectara corpus with identifying metadata.
  • corpora_search — on later sessions, the agent queries the same corpus, typically with a metadata_filter scoped to the user or tenant.

A memory corpus is just a corpus with a role. The same retrieval tuning that applies to any Vectara corpus — hybrid search, reranking, filters — applies here too, and so do the same access controls.

Structure the corpus for recall, not archival

A memory corpus is a retrieval surface, not an append-only log. Three rules of thumb:

  • Scope one corpus per tenant or per user. This is what gives you control over who can recall what. An agent can only reach the memories of corpora its corpora_search tool is pointed at.
  • Stamp every note with identifying metadata. At minimum: user_id, session_id, creation timestamp, and a tag describing what kind of note it is. Apply a metadata_filter on every query so one user's memories never leak into another's session.
  • Write notes the model will want to search. If the note is a verbatim transcript, the agent will struggle to pull the right fragment back. A 3-sentence synthesis ("User prefers dark UI; asked twice in July; confirmed in ticket T-4410") retrieves far better than a 200-line chat log.

Plan how to forget

Memory without forgetting turns into noise. Decide upfront how notes get evicted:

  • TTL via metadata. Stamp each note with expires_at and apply it as a metadata_filter at query time.
  • Periodic rollups. Run a maintenance agent that reads the memory corpus, merges stale notes into a rollup, and deletes the originals.
  • Explicit forget tool. Expose a tool that removes notes matching a filter — so a user asking "forget this" has a path that isn't "manually delete documents."

When to reach for each

  • Use in-session artifacts for plans, scratch state, and step-to-step handoffs within the current conversation. They cost nothing to set up and are always consistent with the current turn.
  • Use a corpus-backed memory for anything that needs to persist beyond the session or be shared between agents or users. The cost is the indexing pipeline and the discipline around metadata and forgetting.
  • Use both for long workflows: artifact as scratchpad during the session, corpus as the long-term record at the end.

Start with artifacts. Add a corpus-backed memory only when you have a concrete recall pattern across sessions — vague "the agent should remember things" rarely survives contact with a real user base.

  • Artifacts — the underlying storage for session-scoped notes.
  • Compaction — why artifact-backed memory survives long sessions when the transcript does not.
  • Context engineering overview
  • Steps — plan-then-execute uses in-session memory as its handoff mechanism.