Memory
Most agent turns only see the current session's messages. Real agents often need to recall more — a plan the agent made an hour ago in the same session, a user preference from last week, a decision an operator made six months ago. This page covers the patterns Vectara supports for building that memory.
There are two scopes, and they use different primitives:
- In-session memory — state the agent carries across turns within a single session. Built on session artifacts.
- Cross-session memory — facts that persist across sessions or are shared between agents. Built on a Vectara corpus, indexed from artifacts.
The two patterns compose: an agent can use an artifact as a scratchpad during a session and index the final synthesis into a corpus at the end, so future sessions can retrieve it.
In-session memory via artifacts
An agent with artifact_create configured can write its own
artifacts during a session and read them back later. That turns the
session's artifact store into a persistent scratchpad — memory that
survives compaction, survives recency
bias, and can be searched with artifact_grep or queried with
artifact_jq without reloading the original content into the prompt.
This is the "writing context" pattern from context engineering: rather than cramming everything the agent knows into its prompt, the agent offloads state to disk and reads it back on demand.
Common uses:
- Running plans and to-do lists. The agent writes a task list at
the start of a long session (
plan.md) and updates it as work progresses. Subsequent turns grep the plan instead of re-reading the whole transcript. - Accumulated notes. During a research session the agent captures facts, citations, or decisions in a notes artifact. The raw transcripts can be compacted away without losing the synthesized summary.
- Draft outputs in progress. Long-form work — reports, code, briefs — lives in an artifact that gets iteratively edited rather than re-emitted in full on every turn.
- Intermediate tool results. When a tool returns something the
agent wants to refer to later, writing it to an artifact makes it
searchable with
artifact_grepand survives compaction. - Handoffs across steps. A classifier step writes its findings to an artifact; a downstream specialist step reads the artifact instead of parsing the conversation history. See also the plan-then-execute pattern in Steps.
Artifacts live for the full session, so the agent can treat them as persistent memory within a single conversation — even if the surrounding turns get compacted or the session spans many hours.
Cross-session memory via a corpus
Artifacts are scoped to the session that created them. If you want the agent to remember something across sessions — past decisions, user preferences, facts it learned last week — the pattern is: have the agent write the note to an artifact, index the artifact into a corpus, and search that corpus on future turns.
The pieces:
artifact_create— the agent writes the note it wants to remember.- A document-indexing tool (for example
core_document_index) — writes the artifact into a Vectara corpus with identifying metadata. corpora_search— on later sessions, the agent queries the same corpus, typically with ametadata_filterscoped to the user or tenant.
A memory corpus is just a corpus with a role. The same retrieval tuning that applies to any Vectara corpus — hybrid search, reranking, filters — applies here too, and so do the same access controls.
Structure the corpus for recall, not archival
A memory corpus is a retrieval surface, not an append-only log. Three rules of thumb:
- Scope one corpus per tenant or per user. This is what gives
you control over who can recall what. An agent can only reach the
memories of corpora its
corpora_searchtool is pointed at. - Stamp every note with identifying metadata. At minimum:
user_id,session_id, creation timestamp, and a tag describing what kind of note it is. Apply ametadata_filteron every query so one user's memories never leak into another's session. - Write notes the model will want to search. If the note is a verbatim transcript, the agent will struggle to pull the right fragment back. A 3-sentence synthesis ("User prefers dark UI; asked twice in July; confirmed in ticket T-4410") retrieves far better than a 200-line chat log.
Plan how to forget
Memory without forgetting turns into noise. Decide upfront how notes get evicted:
- TTL via metadata. Stamp each note with
expires_atand apply it as ametadata_filterat query time. - Periodic rollups. Run a maintenance agent that reads the memory corpus, merges stale notes into a rollup, and deletes the originals.
- Explicit forget tool. Expose a tool that removes notes matching a filter — so a user asking "forget this" has a path that isn't "manually delete documents."
When to reach for each
- Use in-session artifacts for plans, scratch state, and step-to-step handoffs within the current conversation. They cost nothing to set up and are always consistent with the current turn.
- Use a corpus-backed memory for anything that needs to persist beyond the session or be shared between agents or users. The cost is the indexing pipeline and the discipline around metadata and forgetting.
- Use both for long workflows: artifact as scratchpad during the session, corpus as the long-term record at the end.
Start with artifacts. Add a corpus-backed memory only when you have a concrete recall pattern across sessions — vague "the agent should remember things" rarely survives contact with a real user base.
Related
- Artifacts — the underlying storage for session-scoped notes.
- Compaction — why artifact-backed memory survives long sessions when the transcript does not.
- Context engineering overview
- Steps — plan-then-execute uses in-session memory as its handoff mechanism.