Compaction
Compaction keeps a session alive past the model's context window by summarizing older turns and hiding the original events from the LLM. Compacted events are soft-hidden — still retrievable via the events API — and the agent is given a tool to search back through them on demand.
When to reach for this
- Long multi-turn RAG sessions that outlive a single context window.
- Phase-boundary compaction — e.g., compact after a research phase before the agent starts drafting.
- Ticket-triage and other flows where older turns go stale but their facts still matter.
- Skip it for short Q&A or single-shot tool runs: the overhead isn't worth it.
When compaction runs
Compaction is configured on the agent and snapshotted onto each
session at creation time. Read back the snapshot from the session's
effective_compaction field to verify what will run. It can trigger
in two ways:
- Automatically, before any LLM call, once estimated context usage crosses a configured threshold of the model's context window.
- On demand, by sending a
compactrequest to the session events endpoint. Useful at phase boundaries you care about — for example, compacting after a long research phase before switching to drafting.
Either path runs the same flow: a summarization LLM reads the older events, emits a summary, and marks those events as hidden from the main conversation. The most recent few user turns are always preserved verbatim so the agent keeps a fresh view of the conversation.
Configure on the agent
Compaction is enabled by default. Override the defaults on the
compaction field of the agent:
AGENT WITH CUSTOM COMPACTION CONFIGURATION
Code example with json syntax.1
The key knobs and why you'd touch them:
enabled— turns automatic compaction on or off. Manual compaction still works when disabled, so you can drive compaction entirely from phase boundaries if that fits your flow better.threshold_percent— how full the context window must get before automatic compaction kicks in. Lower it when you'd rather pay the summarization cost earlier than risk brushing the window; raise it when summaries lose detail you need and you'd rather compact less often.keep_recent_inputs— how many recent user turns to preserve verbatim. Raise it when the agent frequently needs exact wording from the last several turns; the cost is higher token usage per turn and less aggressive reclaim.tool_event_policy— controls which tool events the summarizer sees. See the trade-offs below.compaction_message— extra instructions appended to the summarization prompt. Tune this per-workload; the default is deliberately generic.
See the agent schema reference for exact defaults and ranges.
Tune compaction_message — it matters
The default compaction prompt produces a generic summary. For any
production workload, you will get better continuity by telling the
summarizer what your agent cares about. compaction_message is
appended to the summarization prompt on every compaction, so it's the
right place for things like:
- Identifiers that must survive ("preserve every ticket ID, invoice number, and account reference").
- State the agent is tracking ("carry forward the user's stated goal verbatim, plus any partial decisions already made").
- Format requirements ("return the summary as a bulleted list grouped by topic, with recent items first").
- Things to drop aggressively ("omit retrieved document quotes; they can be retrieved again if needed").
Treat this like a small companion to your main instructions. When you
find the agent losing track of something specific across compactions,
add a line to compaction_message about it and iterate.
For example, a support-triage agent with:
"compaction_message": "Preserve every ticket ID and account number.
Carry the user's stated goal forward verbatim. Omit retrieved KB
article bodies."
applied to 40 turns of back-and-forth produces a summary like:
User goal: refund the duplicate charge on account A-8821.
Open tickets: T-4410 (billing, waiting on ops), T-4417 (access,
resolved). Confirmed card last-4 1142. KB lookups already done for
refund policy and dispute window — omit bodies.
The identifiers survive, the goal is quoted verbatim, and the KB prose is dropped.
tool_event_policy trade-offs
Tool outputs are often the bulk of the tokens in a session. The default includes tool outputs but omits the tool-call chatter, which is the right balance for most workloads. Exclude tool events entirely only if they are drowning the summary, and you are willing to lose context about what the agent already did. Include everything only when the summarizer is making mistakes that depend on knowing what the agent searched or asked for — the fidelity costs tokens on every compaction.
Trigger compaction manually
Send a compact request to the session events endpoint to force a
compaction without waiting for the threshold:
MANUAL COMPACTION
Code example with multiple language options.1
The request body can optionally anchor the boundary to a specific
event (overriding the recent-turns floor) and override the agent-level
compaction_message for that call only — handy when a phase transition
calls for different summarization guidance. Manual compaction can be
sent while the session is actively processing; it will be queued as a
follow-up.
What the agent sees after compaction
Compacted events are soft-hidden, not deleted. They remain in the session's event list and are visible via the events endpoint, but they are no longer sent to the LLM on subsequent turns. The LLM sees the produced summary, the most recent user turns kept verbatim, and any events created after the compaction.
The session emits two events you can observe: compaction_started
when compaction begins and compaction when it completes. Filter for
these in your event stream to surface compaction in logs or dashboards;
see the API reference for the payload shape.
If a session is compacted more than once, each subsequent summary is built with the previous summary as input, so information is carried forward through successive compactions rather than being dropped.
Recovering detail from hidden turns
Whenever compaction is enabled, the agent is auto-registered with a
search_session_history tool. It is not configured via
tool_configurations — there is no such entry. The agent calls it
when it needs to recall detail from events compaction has hidden.
The important thing to know when designing prompts around it: the query is a case-insensitive substring match over event content, not natural-language or semantic search. That makes it great for narrow lookups by identifier (ticket IDs, account numbers, exact phrases) and poor for fuzzy "what did the user say about shipping" queries. Nudge the agent toward identifier-style queries in your instructions when the session is likely to grow large.
Limits and trade-offs
Plan for these failure modes:
- Summaries lose detail. Expect the summarizer to miss nuances,
re-word claims, and occasionally drop facts the full conversation
had. Tune
compaction_messageto protect the details that matter, and use a stronger summarizer model if drift is costly. - Latency on the triggering turn. Automatic compaction runs synchronously before the next LLM call, so the turn that triggers it pays a one-time cost.
- Short sessions skip it. A session needs more user turns than the verbatim-retention floor before compaction can run. Very short sessions will silently never compact.
- Hidden events are still stored. Compaction reduces tokens sent to the LLM per turn; it does not remove events from the session. Storage is unchanged.
- Session history search is not a full-text index. It is a sequential substring scan over hidden events, capped per query. Use it for narrow lookups, not broad search over very large sessions.
- Threshold is an estimate. Usage is estimated from the LLM's last-reported token count plus an estimate for new events, so compaction may run slightly before or after the configured threshold.