Version: 2.0

Compaction

Compaction keeps a session alive past the model's context window by summarizing older turns and hiding the original events from the LLM. Compacted events are soft-hidden — still retrievable via the events API — and the agent is given a tool to search back through them on demand.

When to reach for this

Long multi-turn RAG sessions that outlive a single context window.
Phase-boundary compaction — e.g., compact after a research phase before the agent starts drafting.
Ticket-triage and other flows where older turns go stale but their facts still matter.
Skip it for short Q&A or single-shot tool runs: the overhead isn't worth it.

When compaction runs

Compaction is configured on the agent and snapshotted onto each session at creation time. Read back the snapshot from the session's effective_compaction field to verify what will run. It can trigger in two ways:

Automatically, before any LLM call, once estimated context usage crosses a configured threshold of the model's context window.
On demand, by sending a compact request to the session events endpoint. Useful at phase boundaries you care about — for example, compacting after a long research phase before switching to drafting.

Either path runs the same flow: a summarization LLM reads the older events, emits a summary, and marks those events as hidden from the main conversation. The most recent few user turns are always preserved verbatim so the agent keeps a fresh view of the conversation.

Configure on the agent

Compaction is enabled by default. Override the defaults on the compaction field of the agent:

AGENT WITH CUSTOM COMPACTION CONFIGURATION

Code example with json syntax.

The key knobs and why you'd touch them:

enabled — turns automatic compaction on or off. Manual compaction still works when disabled, so you can drive compaction entirely from phase boundaries if that fits your flow better.
threshold_percent — how full the context window must get before automatic compaction kicks in. Lower it when you'd rather pay the summarization cost earlier than risk brushing the window; raise it when summaries lose detail you need and you'd rather compact less often.
keep_recent_inputs — how many recent user turns to preserve verbatim. Raise it when the agent frequently needs exact wording from the last several turns; the cost is higher token usage per turn and less aggressive reclaim.
tool_event_policy — controls which tool events the summarizer sees. See the trade-offs below.
compaction_message — extra instructions appended to the summarization prompt. Tune this per-workload; the default is deliberately generic.

See the agent schema reference for exact defaults and ranges.

Tune `compaction_message` — it matters

The default compaction prompt produces a generic summary. For any production workload, you will get better continuity by telling the summarizer what your agent cares about. compaction_message is appended to the summarization prompt on every compaction, so it's the right place for things like:

Identifiers that must survive ("preserve every ticket ID, invoice number, and account reference").
State the agent is tracking ("carry forward the user's stated goal verbatim, plus any partial decisions already made").
Format requirements ("return the summary as a bulleted list grouped by topic, with recent items first").
Things to drop aggressively ("omit retrieved document quotes; they can be retrieved again if needed").

Treat this like a small companion to your main instructions. When you find the agent losing track of something specific across compactions, add a line to compaction_message about it and iterate.

For example, a support-triage agent with:

"compaction_message": "Preserve every ticket ID and account number.
Carry the user's stated goal forward verbatim. Omit retrieved KB
article bodies."

applied to 40 turns of back-and-forth produces a summary like:

User goal: refund the duplicate charge on account A-8821.
Open tickets: T-4410 (billing, waiting on ops), T-4417 (access,
resolved). Confirmed card last-4 1142. KB lookups already done for
refund policy and dispute window — omit bodies.

The identifiers survive, the goal is quoted verbatim, and the KB prose is dropped.

`tool_event_policy` trade-offs

Tool outputs are often the bulk of the tokens in a session. The default includes tool outputs but omits the tool-call chatter, which is the right balance for most workloads. Exclude tool events entirely only if they are drowning the summary, and you are willing to lose context about what the agent already did. Include everything only when the summarizer is making mistakes that depend on knowing what the agent searched or asked for — the fidelity costs tokens on every compaction.

Trigger compaction manually

Send a compact request to the session events endpoint to force a compaction without waiting for the threshold:

MANUAL COMPACTION

Code example with multiple language options.

The request body can optionally anchor the boundary to a specific event (overriding the recent-turns floor) and override the agent-level compaction_message for that call only — handy when a phase transition calls for different summarization guidance. Manual compaction can be sent while the session is actively processing; it will be queued as a follow-up.

What the agent sees after compaction

Compacted events are soft-hidden, not deleted. They remain in the session's event list and are visible via the events endpoint, but they are no longer sent to the LLM on subsequent turns. The LLM sees the produced summary, the most recent user turns kept verbatim, and any events created after the compaction.

The session emits two events you can observe: compaction_started when compaction begins and compaction when it completes. Filter for these in your event stream to surface compaction in logs or dashboards; see the API reference for the payload shape.

If a session is compacted more than once, each subsequent summary is built with the previous summary as input, so information is carried forward through successive compactions rather than being dropped.

Recovering detail from hidden turns

Whenever compaction is enabled, the agent is auto-registered with a search_session_history tool. It is not configured via tool_configurations — there is no such entry. The agent calls it when it needs to recall detail from events compaction has hidden.

The important thing to know when designing prompts around it: the query is a case-insensitive substring match over event content, not natural-language or semantic search. That makes it great for narrow lookups by identifier (ticket IDs, account numbers, exact phrases) and poor for fuzzy "what did the user say about shipping" queries. Nudge the agent toward identifier-style queries in your instructions when the session is likely to grow large.

Limits and trade-offs

Plan for these failure modes:

Summaries lose detail. Expect the summarizer to miss nuances, re-word claims, and occasionally drop facts the full conversation had. Tune compaction_message to protect the details that matter, and use a stronger summarizer model if drift is costly.
Latency on the triggering turn. Automatic compaction runs synchronously before the next LLM call, so the turn that triggers it pays a one-time cost.
Short sessions skip it. A session needs more user turns than the verbatim-retention floor before compaction can run. Very short sessions will silently never compact.
Hidden events are still stored. Compaction reduces tokens sent to the LLM per turn; it does not remove events from the session. Storage is unchanged.
Session history search is not a full-text index. It is a sequential substring scan over hidden events, capped per query. Use it for narrow lookups, not broad search over very large sessions.
Threshold is an estimate. Usage is estimated from the LLM's last-reported token count plus an estimate for new events, so compaction may run slightly before or after the configured threshold.

When to reach for this​

When compaction runs​

Configure on the agent​

AGENT WITH CUSTOM COMPACTION CONFIGURATION

Tune compaction_message — it matters​

tool_event_policy trade-offs​

Trigger compaction manually​

MANUAL COMPACTION

What the agent sees after compaction​

Recovering detail from hidden turns​

Limits and trade-offs​

Related​