Skip to main content
Version: 2.0

Multi-client steering

This page covers two related capabilities:

  • Multi-client consumption — multiple simultaneous clients can read the same session's events without losing messages, and any client can resume from a known point.
  • Mid-turn steering and interrupts — a user can steer the agent into a new direction without waiting for it to finish, or cancel its current turn entirely.

Event stream model

Every session has a single, durably-ordered event list. Events are appended by the platform as the agent produces them (LLM outputs, tool calls, tool outputs), and by clients when they send input. Any client can read the list at any time, from anywhere in it.

There is no per-client queue and no "which client owns this turn?" concept.

Reading and streaming events

GET /v2/agents/{agent_key}/sessions/{session_key}/events returns the session's events as a paginated JSON list. It's the right call for fetching history or polling as a read-only follower; it does not stream.

Live event streaming happens on POST to the same path with stream_response: true. The POST queues input and the response is a server-sent-events stream from the since cursor onward, including events produced as a result of the new input. Live SSE tailing is tied to a POST that queues input — there is no separate "subscribe" endpoint.

Resuming from a cursor

Every event has an id of the form aev_*. A client that was disconnected, or that just opened the session in a new tab, sends its last-seen event id as since in its next POST; the platform returns everything after that id and then tails live. The special value "start" returns the whole history from the beginning.

SECOND CLIENT JOINING AN IN-PROGRESS SESSION

Code example with multiple language options.
1

Steering is just as useful when the agent is retrieving the wrong thing. If the user sees the agent searching a corpus that won't have the answer, they can redirect it between tool calls without waiting for the current retrieval to finish:

REDIRECTING RETRIEVAL MID-TURN

Code example with json syntax.
1

Input behavior: steer vs follow_up

When a client sends input to a session that is already running, the behavior field controls when the input is applied:

  • steer — insert the input as soon as possible on the next iteration of the agent loop. Use this when the user is correcting or redirecting the agent mid-turn. The agent notices the new input between tool calls or before its next LLM call.
  • follow_up — queue the input for after the current turn completes. Follow-ups are consumed one at a time, so each gets a full agent loop iteration. This is the right behavior when the user is adding a new message that should be treated as a fresh turn, not a mid-course correction.

QUEUEING A FOLLOW-UP TURN

Code example with json syntax.
1

The snippet omits type; it defaults to input_message.

Both behaviors are safe to send concurrently from multiple clients. Inputs are appended to a queue and processed in order.

Interrupting the agent

If the user wants to stop the agent mid-turn without sending new input, send an interrupt request:

CANCEL THE CURRENT TURN

Code example with multiple language options.
1

An interrupt causes the platform to:

  • Stop consuming the current LLM stream. Partial output produced so far is discarded, not persisted.
  • Cancel any in-flight tool calls. Each running tool call ends with a synthetic error event so the agent sees it as a failed call rather than a hanging one.
  • Emit a session_interrupted event to every connected client so they know the turn ended.

The session remains open and ready for a new input message. An interrupt on its own doesn't steer — it just cancels. To interrupt and redirect in a single request, send an input_message with behavior: "steer" instead; it interrupts the current iteration and injects the new message at the next safe point.

What multiple clients can and can't do

Multi-client consumption on a session is fully supported:

  • Any number of clients can read the same session's events concurrently. Clients that are also sending input can stream the response; pure followers page the GET endpoint.
  • Any client can send input concurrently. Inputs queue in the order received.
  • Any client can interrupt.
  • Each client can independently resume from its own cursor.

The one restriction is that only one agent loop runs at a time per session. If a client attempts to start a new loop on a session that is already running — for example, by sending an input without a since value — the request fails with 409 Conflict and a message telling the caller to include since and treat the session as in-progress. This prevents two concurrent loops from producing interleaved events on the same session.

The fix is what the error message says: include a since cursor and treat the session as in-progress rather than starting a new one.

If inputs arrive faster than the agent can consume them and the internal queue fills, the platform returns 429 Too Many Requests. In practice this only happens under abuse or a client bug; normal multi-device usage does not come close.

Ordering and consistency guarantees

  • Events are durably ordered. Every connected client sees the same events in the same order. No client sees "future" events before "past" ones.
  • No dropped messages. Inputs accepted by the platform are always reflected in the event list; a client that resumes from its last cursor cannot miss them.
  • Steer inputs interleave at safe points. A steer input lands between tool calls or before the next LLM call, never in the middle of a partial LLM response.
  • Interrupted turns are visible. Interrupts emit a session_interrupted event so every client can render the state correctly.

Common patterns

  • Mobile + web. The same user keeps both open; events stream to both. Sending from either one just works.
  • Agent-pairing UI. A human operator watches an agent handle a customer. The operator's UI reads the session events in real time and can send a steer input to redirect the agent when needed.
  • Transfer a session between clients. A user logs out on device A and opens the same session on device B. The new client pages the GET endpoint to fetch history, then sends its first input POST with a recent since cursor and stream_response: true to catch up and tail live.
  • Cancel a long tool chain. The user sees the agent running a slow sql_query they realize is wrong. They send interrupt and the agent stops cleanly.

Limits and trade-offs

  • Partial output is discarded on interrupt. If you need the agent to commit what it has so far before stopping, don't use interrupt. Send a steering message that tells the agent to wrap up what it's doing instead.
  • Steer inputs land at safe points. The agent does not mid-stream-edit an LLM response; it finishes the current in-flight LLM call, then checks for steer input. For most use cases this feels instantaneous; under a long-running tool call the perceived latency to steer is the time until the tool returns.
  • Queue capacity is finite. If your product fan-ins a lot of concurrent input to a single session, you can hit the queue limit and receive 429s. This is a design smell — consider whether you should be using sub-agents for the parallel work.