Agent anatomy
An agent groups together a set of parts that each do one job. This page is the conceptual map — what each part is and how the JSON fits together. For the full configuration reference of any one part, follow the link at the end of its section.
The whole anatomy is declarative. What is not in the agent's JSON does not happen at runtime. No hidden behavior, no surprises in production.
This page is a structural tour. The companion guide Context engineering is the tuning playbook — which levers to pull, in what order, for the best results from a production agent.
The mental model
The application sits between the user and the agent. Vectara is headless — the end-user UI lives in the application layer, not in the platform itself.
| Layer | Role |
|---|---|
| End user | Sees only the branded UI. Asks questions, uploads files, approves actions through controls the application exposes. |
| The application (built by your team in hours, or delivered turnkey by Vectara Managed Agents) | Holds user identity, calls Vectara REST API, streams responses, renders citations and approvals. A thin layer — the platform does the AI work. |
| Vectara agent + platform | Runs the declared agent over a session. Calls tools, queries corpora, generates with the chosen LLM, streams events back. |
The end user has no concept of "agent". They see your product — whether your team built it or Vectara delivered it.
A tour of the parts
| Part | Role | Canonical reference |
|---|---|---|
| Identity & model | name, description, model — what the agent is and which LLM powers it. | Agent concepts |
| Steps & workflow | The state machine — named steps, conditional routing, per-step scope. | Workflows, Steps |
| Skills | Expertise the agent loads only when it needs it. | Skills |
| Tools | What the agent can do. 35+ built-in, Python Lambdas, MCP servers. | Tools & connectors, Agent tools |
| Corpus | Persistent context — knowledge, memory, scratchpad in one primitive. | Knowledge, Context & memory |
| Sub-agents | Other agents called as tools. | Workflows § sub-agents, Sub-agents reference |
| Sessions | Where one user's history, metadata, artifacts, and per-user secrets live. | Sessions |
| Context engineering | Compaction and tool-output offloading keep long sessions inside the model's context window. | Compaction, Tool-output offloading |
| Secrets | Credentials by symbolic reference. The LLM never sees the literal value. | Agent secrets, Connector authentication |
| API surface | Plain REST plus a streaming event protocol. | REST API reference, Request lifecycle |
Each section below is a one-paragraph tour. Follow the canonical reference for the full configuration model.
Identity & model
Three top-level fields name the agent and pick the LLM.
{
"name": "support_agent",
"description": "L1 support across the KB and ticketing system.",
"model": { "name": "claude-sonnet-4-6" }
}
The model field accepts any provider in the LLM gateway: hosted
models from Anthropic, OpenAI, and Gemini, on-prem
industry-specialized models, or your own deployment via BYO LLM. Swap
the model per agent — or per step — without touching the rest.
For the full reference, see Agent concepts.
Steps & workflow
An agent is not one LLM call. It is a state machine. Steps split the work into named, discrete units. Each step has its own instructions, scoped tool set, and output schema. Conditional routing between steps is the agent's logic.
{
"first_step": { "name": "classify", "next_steps": [ /* conditions */ ] },
"steps": [ /* ... */ ],
"reentry_step": "classify"
}
The LLM decides what to call within a step. The platform decides which step runs next. Transitions evaluate against structured output, not prose — the LLM cannot talk the router into a different branch.
For the conceptual chapter on stepped agents, conditional routing, sub-agents, structured outputs, and cross-session approvals, see Workflows. For the configuration reference, see Steps and Structured outputs.
Skills
A skill is an instruction block the LLM loads on demand. Instead
of stuffing every policy, edge case, and persona into a monolithic
system prompt, the LLM sees a one-line description of each skill and
calls invoke_skill(name) only when it needs that domain. The full
instruction body enters context only then.
That keeps the always-on system prompt small and lets you add a new domain without re-tuning the whole prompt.
For the configuration model and patterns, see Skills.
Tools
A tool is a small, named, schema-typed function the LLM can call. The
agent's tool_configurations map lists what is available. Three
sources fill that map: built-in catalog (35+ tools, no code),
Lambda tools (your Python, sandboxed), and MCP servers (any
Model Context Protocol server you register).
"tool_configurations": {
"kb_search": { "type": "corpora_search", "query_configuration": { /* ... */ } },
"score_lead": { "type": "lambda", "tool_id": "<id_returned_by_POST_/v2/tools>" },
"github": { "type": "mcp", "server_url": "https://mcp.example.com/github" }
}
For the conceptual chapter on tools, MCP, web_get, and inbound
connectors (Slack and more), see
Tools & connectors.
For the configuration references, see
Agent tools,
Lambda tools,
Model Context Protocol, and
web_get tool.
Corpus
A corpus is the agent's persistent context layer: writable, filterable, multilingual, semantic. Anything the agent needs to know, remember, or recall goes into a corpus — same retrieval engine, same RBAC, same citations across three different jobs:
- Knowledge — pre-loaded documents (manuals, KB, policies).
- Memory — what the agent learns about a user.
- Scratchpad — cached tool results and derived facts.
For the RAG/knowledge side, see Knowledge. For the memory and scratchpad sides, see Context & memory (conceptual framing) and Memory (the canonical patterns guide).
Sub-agents
A sub-agent is a full Vectara agent wrapped as a single tool. The parent calls it like any other tool. The called agent runs in its own session, with its own steps, tools, model, and secrets, and returns its result. The parent continues.
"approvals": {
"type": "sub_agent",
"agent_key": "manager_approval_agent"
}
The wins — composition without coupling, reuse across orchestrators, bounded scope, and async-friendly cross-session approvals — are covered in Workflows § sub-agents. For the configuration reference, see Sub-agents.
Sessions
A session is a long-lived container for one user's conversation:
every message, every tool call, every result, plus your own
session.metadata (anything the agent reads via $session.metadata.*)
and per-session session.secrets.
POST /v2/agents/{agent_key}/sessions
{
"session": {
"metadata": { "user_id": "alice.chen", "tier": "enterprise" },
"secrets": { "gdrive_token": "ya29.A0AfH6..." }
}
}
session.metadata is how a single agent definition serves every
tenant — tools and corpora $ref metadata, and the same agent serves
every user. Artifacts handle large tool outputs and step-to-step
handoffs (see Artifacts).
For the full reference, see Sessions.
Context engineering
Two platform features keep long-running sessions inside the model's context window without you writing window-management code.
- Compaction summarizes older turns and hides them from the LLM
when usage crosses a threshold. The agent gets a built-in
search_session_historytool to recall facts from the hidden range. - Tool-output offloading parks oversized tool outputs in a session artifact (or truncates them) before they dwarf the prompt. On by default for every agent.
For the full reference, see Compaction and Tool-output offloading.
Secrets
Credentials live by reference. The LLM literally cannot see them. Three scopes cover every credential pattern:
agent.secrets— long-lived, agent-owned, encrypted at rest. Use for service-account tokens.session.secrets— per session, user-specific, also encrypted. Use for the end user's OAuth access token.$ref— symbolic references in tool arguments. The platform resolves the literal value at call time, just before the tool executes. The LLM sees only the symbol, never the value.
"tool_configurations": {
"sf_lead": {
"type": "web_get",
"argument_override": {
"headers": { "Authorization": { "$ref": "agent.secrets.sf_bearer" } }
}
}
}
Paired with argument_override, which sets specific tool parameters
at config time and hides them from the LLM entirely. A prompt-injection
attack has nothing to exfiltrate and nowhere to escalate.
For the full reference, see Agent secrets. For the three authentication patterns (per-user OAuth, per-user bearer, service-account OAuth) with copy-paste configs, see Connector authentication.
API surface
No SDK lock-in, no proprietary protocol. Your app makes plain HTTP calls, the agent streams back typed Server-Sent Events.
| Method | Path | Purpose |
|---|---|---|
POST | /v2/agents | Create or update an agent (config JSON). |
POST | /v2/agents/{agent_key}/sessions | Start a session. |
POST | /v2/agents/{agent_key}/sessions/{session_key}/events | Send a user message. Response is a stream of typed events. |
GET | /v2/agents/{agent_key}/sessions/{session_key} | Fetch full session history. |
POST | /v2/tools | Register a Lambda tool or MCP server. |
POST | /v2/corpora | Create a corpus. |
For the event protocol — every event type, the request body shape, and how to render each one — see Request lifecycle. For the full schema, see the REST API reference.
Putting it together: a refund-handling agent
The parts above all show up in one config. End user types: "I want to return order #A-12345 — the item arrived damaged."
{
"name": "refund_support_agent",
"model": { "name": "claude-sonnet-4-6" },
"tool_configurations": {
"kb_search": {
"type": "corpora_search",
"query_configuration": { "corpus_key": "support-kb" }
},
"order_lookup": {
"type": "web_get",
"argument_override": {
"url": "https://orders.internal/v1/orders",
"headers": { "Authorization": { "$ref": "agent.secrets.orders_api" } }
}
},
"refund_amount": {
"type": "lambda",
"tool_id": "<id_returned_by_POST_/v2/tools>"
},
"approval": {
"type": "sub_agent",
"agent_key": "manager_approval_agent"
}
},
"first_step": {
"name": "classify",
"allowed_tools": ["kb_search"],
"output_parser": {
"type": "json",
"json_schema": { "required": ["intent", "order_id", "reason"] }
},
"next_steps": [{ "step_name": "lookup" }]
},
"steps": [
{
"name": "lookup",
"allowed_tools": ["order_lookup", "refund_amount", "kb_search"],
"next_steps": [
{ "condition": "get('$.amount') > 500", "step_name": "manager_gate" },
{ "step_name": "resolve" }
]
},
{
"name": "manager_gate",
"allowed_tools": ["approval"],
"next_steps": [{ "step_name": "resolve" }]
},
{
"name": "resolve",
"output_parser": { "type": "search" }
}
]
}
What runs:
classify— the LLM parses the message into a JSON spec (intent,order_id,reason).kb_searchis available for clarification only.lookup—order_lookupcalls the internal Orders API.refund_amountcomputes the eligible amount with pure-Python rules.kb_searchpulls the relevant policy passage.manager_gate— refunds over $500 route through theapprovalsub-agent, which can pause cross-session for a human approver.resolve— generates the user-facing answer with policy citations.
The application streams events as they arrive. The user sees the policy reference, the decision, and an "awaiting approval" indicator when the manager gate is engaged.
Why this shape
| Property | What it means |
|---|---|
| Declarative | Every piece of the agent is configuration. What is not in the JSON does not happen. |
| Composable | Tools, skills, sub-agents, steps — primitives that snap together. |
| Safe by construction | Per-step tool scope, locked arguments, eager-resolved secrets, structured outputs. |
| API-first | Plain REST and a typed event stream. Your app calls in and renders the result. |
Related
- Agents quickstart — build a working agent in the Console.
- Agent concepts — the configuration model at the API level.
- Context engineering — the tuning playbook companion to this page.
- Request lifecycle — what happens when a user sends one message.
- Knowledge — the RAG side of the corpus primitive.
- Context & memory — the memory and scratchpad side of the corpus primitive.