Version: 2.0

Request lifecycle

What happens when a user sends a message to an agent? This page walks the runtime end-to-end, conceptually, covering what the platform does between the moment the application hands off the user's message and the moment the response finishes streaming back.

Every stage is observable: the platform emits typed events for everything it does. There is no opaque hop. That's what makes the runtime auditable and what makes the Agent Playground able to show every decision live.

For the API-level details (HTTP path, request body shape, event names), see Agent anatomy § API surface and the REST API reference.

The end-to-end picture

The application                Vectara platform                The application
───────────────                ────────────────                ───────────────
user message ──▶  1. Resume at the current step
                  2. Assemble step prompt + scoped tools
                  3. Resolve secrets at call time
                  4. LLM decides: generate, or call a tool
                  5. Tool runs; result feeds back to the LLM
                  6. Validate step output (if step has a schema)
                  7. Routing picks the next step
                                                       ──── events ────▶  render

Each numbered stage below is one of those steps. The platform emits a typed event whenever something happens (a step transition, a tool call, a tool result, a chunk of generated text), so the application can render in real time and the operator can replay the session in the Admin Center.

Step by step

1. The application sends the user’s message to the agent runtime

The application opens a streaming connection to the agent runtime, identifies the agent and session, and passes the user’s message with any turn-specific session.metadata, such as locale, account tier, user role, or other values the agent references through $session.metadata.*.

What "the application" means here covers both options from The application layer, either your team's code or the application Vectara Managed Agents operates on your behalf. The platform sees the same thing either way.

2. The runtime loads the current step

The agent is a state machine (Workflows). For an existing session, the runtime resumes at the step indicated by the agent's reentry_step, the hook that routes second-turn messages to the right state, not necessarily back to intake. For a new session, it starts at the first step.

3. Step prompt and scoped tools are assembled

For the current step, the runtime builds the prompt:

Base instructions for this step (different from other steps, since each step has its own).
Skill descriptions for any skills the agent has access to. (The full skill instructions load only if the LLM decides to invoke one. See Skills.)
Tool catalog narrowed to the step's allowed_tools. Least privilege per phase.
Session context: relevant history, artifacts, and the user's new message.

Secrets resolve at this stage, not by being placed into the prompt. $ref values like agent.secrets.sf_bearer or session.secrets.user_oauth are held aside and only injected into tool arguments at the moment a tool runs. The LLM never sees the literal secret value. See Connector authentication.

4. The LLM decides: generate, or call a tool

The chosen model (Anthropic, OpenAI, Gemini, on-prem, or your own deployment via BYO LLM) sees the prompt and the available tool catalog. It either:

Generates a response, which streams back to the application token by token, or
Calls a tool from the scoped catalog.

The model picks freely within the step. What it cannot do is escape the step: it can only call tools the step permits, and when the step ends, the platform (not the model) decides which step runs next.

5. The tool runs, result feeds back to the LLM

If the LLM called a tool, the runtime executes it according to the tool's type: a corpus search runs through the retrieval engine, a Lambda runs your Python sandboxed, an MCP server is called, a sub-agent runs in its own session, or web_get fetches a third-party API with the resolved credentials. See Tools & connectors.

The tool's result feeds back into the LLM, which either generates a final answer or calls another tool. The loop continues until the model produces output ready for the user, or the step ends.

6. The step's output is validated

If the step has a structured-output schema, the LLM's output is parsed and validated before the step is allowed to close.

JSON output: the output must match the declared schema. Failures retry automatically. Validated fields land in session context so downstream conditions can read them by path. See Structured outputs.
Search output: the output is treated as the user-facing answer, with citations attached.

This validation is the contract that lets workflows compose: a classifier step that emits {intent, order_id, reason} becomes a building block any downstream step can depend on.

7. Routing picks the next step

The step's next_steps conditions evaluate in order against the step's structured output. First match wins. The last unconditional entry, if present, is the catch-all.

The routing logic lives in the agent JSON, not in a paragraph in a prompt, so the LLM can't talk the router into a different branch. See Workflows § conditional routing.

If routing picks another step, the loop returns to stage 3 with that step's config. If routing returns to end or next_steps is empty, the turn completes and control returns to the application.

8. Events stream back to the application throughout

The lifecycle isn't a single response: it's a stream of typed events that the application renders as they arrive. Every meaningful thing the runtime does emits an event: a step transition, a tool call and its result, a structured output, a chunk of generated text.

The application uses these to render a live chat UI (text streams in as the model produces it; tool-call cards pair input and output; citations appear next to the text), or to log every move to an observability sink, or both.

For the full event-type list and what each one carries, see Agent anatomy § API surface. To see them flow in real time against your own agent, open the Agent Playground.

Observability

Every step, every tool call, every retrieval and generation lands in the Admin Center trace and on the event stream simultaneously. There is no hidden execution.

Surface	What you see
Admin Center	Per-session timeline with messages, tool calls, retrieval traces, factual-consistency scores, latencies, and tokens. Replay any session.
Event stream	The same data live during a request. The application can record it as it arrives.
Sinks	Stream every event to Splunk, Grafana, your warehouse, or any HTTP endpoint.

See Query observability and Admin Center.

Pipeline ingestion runs the same lifecycle

The same machinery powers your data ingestion. A pipeline fans every source record into a fresh agent session: same primitives, same events, same factual-consistency grading on any generated content, same dead-letter queue semantics when something fails. The agent runtime is the unit of execution for both live traffic and batch ingestion.

See Pipeline concepts.

Agent anatomy — the parts that the lifecycle wires together.
Workflows — the state-machine concept that drives steps 2 and 7.
Tools & connectors — what the agent reaches out to in step 5.
Try the playground — watch the lifecycle run live against your own agent.

The end-to-end picture​

Step by step​

1. The application sends the user’s message to the agent runtime​

2. The runtime loads the current step​

3. Step prompt and scoped tools are assembled​

4. The LLM decides: generate, or call a tool​

5. The tool runs, result feeds back to the LLM​

6. The step's output is validated​

7. Routing picks the next step​

8. Events stream back to the application throughout​

Observability​

Pipeline ingestion runs the same lifecycle​

Related​