Request lifecycle
What happens when a user sends a message to an agent? This page walks the runtime end-to-end, conceptually — what the platform does between the moment the application hands off the user's message and the moment the response finishes streaming back.
Every stage is observable: the platform emits typed events for everything it does. There is no opaque hop. That's what makes the runtime auditable and what makes the Agent Playground able to show every decision live.
For the API-level details (HTTP path, request body shape, event names), see Agent anatomy § API surface and the REST API reference.
The end-to-end picture
The application Vectara platform The application
─────────────── ──────────────── ───────────────
user message ──▶ 1. Resume at the current step
2. Assemble step prompt + scoped tools
3. Resolve secrets at call time
4. LLM decides: generate, or call a tool
5. Tool runs; result feeds back to the LLM
6. Validate step output (if step has a schema)
7. Routing picks the next step
──── events ────▶ render
Each numbered stage below is one of those steps. The platform emits a typed event whenever something happens (a step transition, a tool call, a tool result, a chunk of generated text), so the application can render in real time and the operator can replay the session in the Admin Center.
Step by step
1. The application hands off the user's message
The application opens a streaming connection to the agent runtime,
identifying the agent and the session, and passes the user's text
plus whatever session.metadata should apply for this turn (locale,
tier, role, anything the agent reads via $session.metadata.*).
What "the application" means here covers both options from The application layer — your team's code, or the application Vectara Managed Agents operates on your behalf. The platform sees the same thing either way.
2. The runtime loads the current step
The agent is a state machine
(Workflows). For an
existing session, the runtime resumes at the step indicated by the
agent's reentry_step — the hook that routes second-turn messages
to the right state, not necessarily back to intake. For a new
session, it starts at the first step.
3. Step prompt and scoped tools are assembled
For the current step, the runtime builds the prompt:
- Base instructions for this step (different from other steps — each step has its own).
- Skill descriptions for any skills the agent has access to. (The full skill instructions load only if the LLM decides to invoke one — Skills.)
- Tool catalog narrowed to the step's
allowed_tools. Least privilege per phase. - Session context — relevant history, artifacts, and the user's new message.
Secrets resolve at this stage, not by being placed into the
prompt. $ref values like agent.secrets.sf_bearer or
session.secrets.user_oauth are held aside and only injected into
tool arguments at the moment a tool runs. The LLM never sees the
literal secret value. See Connector
authentication.
4. The LLM decides — generate, or call a tool
The chosen model — Anthropic, OpenAI, Gemini, on-prem, or your own deployment via BYO LLM — sees the prompt and the available tool catalog. It either:
- Generates a response, which streams back to the application token by token, or
- Calls a tool from the scoped catalog.
The model picks freely within the step. What it cannot do is escape the step — it can only call tools the step permits, and when the step ends, the platform (not the model) decides which step runs next.
5. The tool runs, result feeds back to the LLM
If the LLM called a tool, the runtime executes it according to the
tool's type — a corpus search runs through the retrieval engine, a
Lambda runs your Python sandboxed, an MCP server is called, a
sub-agent runs in its own session, or web_get fetches a third-party
API with the resolved credentials. See
Tools & connectors.
The tool's result feeds back into the LLM, which either generates a final answer or calls another tool. The loop continues until the model produces output ready for the user, or the step ends.
6. The step's output is validated
If the step has a structured-output schema, the LLM's output is parsed and validated before the step is allowed to close.
- JSON output — the output must match the declared schema. Failures retry automatically. Validated fields land in session context so downstream conditions can read them by path. See Structured outputs.
- Search output — the output is treated as the user-facing answer, with citations attached.
This validation is the contract that lets workflows compose: a
classifier step that emits {intent, order_id, reason} becomes a
building block any downstream step can depend on.
7. Routing picks the next step
The step's next_steps conditions evaluate in order against the
step's structured output. First match wins. The last unconditional
entry, if present, is the catch-all.
The routing logic lives in the agent JSON, not in a paragraph in a prompt — the LLM can't talk the router into a different branch. See Workflows § conditional routing.
If routing picks another step, the loop returns to stage 3 with that
step's config. If routing returns to end or next_steps is empty,
the turn completes and control returns to the application.
8. Events stream back to the application throughout
The lifecycle isn't a single response — it's a stream of typed events that the application renders as they arrive. Every meaningful thing the runtime does emits an event: a step transition, a tool call and its result, a structured output, a chunk of generated text.
The application uses these to render a live chat UI (text streams in as the model produces it; tool-call cards pair input and output; citations appear next to the text), or to log every move to an observability sink, or both.
For the full event-type list and what each one carries, see Agent anatomy § API surface. To see them flow in real time against your own agent, open the Agent Playground.
Observability
Every step, every tool call, every retrieval and generation lands in the Admin Center trace and on the event stream simultaneously. There is no hidden execution.
| Surface | What you see |
|---|---|
| Admin Center | Per-session timeline with messages, tool calls, retrieval traces, factual-consistency scores, latencies, and tokens. Replay any session. |
| Event stream | The same data live during a request. The application can record it as it arrives. |
| Sinks | Stream every event to Splunk, Grafana, your warehouse, or any HTTP endpoint. |
See Query observability and Admin Center.
Pipeline ingestion runs the same lifecycle
The same machinery powers your data ingestion. A pipeline fans every source record into a fresh agent session — same primitives, same events, same factual-consistency grading on any generated content, same dead-letter queue semantics when something fails. The agent runtime is the unit of execution for both live traffic and batch ingestion.
See Pipeline concepts.
Related
- Agent anatomy — the parts that the lifecycle wires together.
- Workflows — the state-machine concept that drives steps 2 and 7.
- Tools & connectors — what the agent reaches out to in step 5.
- Try the playground — watch the lifecycle run live against your own agent.