Skip to main content
Version: 2.0

Agent anatomy

An agent groups together a set of parts that each do one job. This page is the conceptual map — what each part is and how the JSON fits together. For the full configuration reference of any one part, follow the link at the end of its section.

The whole anatomy is declarative. What is not in the agent's JSON does not happen at runtime. No hidden behavior, no surprises in production.

Anatomy is the what; tuning is the how

This page is a structural tour. The companion guide Context engineering is the tuning playbook — which levers to pull, in what order, for the best results from a production agent.

The mental model

The application sits between the user and the agent. Vectara is headless — the end-user UI lives in the application layer, not in the platform itself.

LayerRole
End userSees only the branded UI. Asks questions, uploads files, approves actions through controls the application exposes.
The application (built by your team in hours, or delivered turnkey by Vectara Managed Agents)Holds user identity, calls Vectara REST API, streams responses, renders citations and approvals. A thin layer — the platform does the AI work.
Vectara agent + platformRuns the declared agent over a session. Calls tools, queries corpora, generates with the chosen LLM, streams events back.

The end user has no concept of "agent". They see your product — whether your team built it or Vectara delivered it.

A tour of the parts

PartRoleCanonical reference
Identity & modelname, description, model — what the agent is and which LLM powers it.Agent concepts
Steps & workflowThe state machine — named steps, conditional routing, per-step scope.Workflows, Steps
SkillsExpertise the agent loads only when it needs it.Skills
ToolsWhat the agent can do. 35+ built-in, Python Lambdas, MCP servers.Tools & connectors, Agent tools
CorpusPersistent context — knowledge, memory, scratchpad in one primitive.Knowledge, Context & memory
Sub-agentsOther agents called as tools.Workflows § sub-agents, Sub-agents reference
SessionsWhere one user's history, metadata, artifacts, and per-user secrets live.Sessions
Context engineeringCompaction and tool-output offloading keep long sessions inside the model's context window.Compaction, Tool-output offloading
SecretsCredentials by symbolic reference. The LLM never sees the literal value.Agent secrets, Connector authentication
API surfacePlain REST plus a streaming event protocol.REST API reference, Request lifecycle

Each section below is a one-paragraph tour. Follow the canonical reference for the full configuration model.

Identity & model

Three top-level fields name the agent and pick the LLM.

{
"name": "support_agent",
"description": "L1 support across the KB and ticketing system.",
"model": { "name": "claude-sonnet-4-6" }
}

The model field accepts any provider in the LLM gateway: hosted models from Anthropic, OpenAI, and Gemini, on-prem industry-specialized models, or your own deployment via BYO LLM. Swap the model per agent — or per step — without touching the rest.

For the full reference, see Agent concepts.

Steps & workflow

An agent is not one LLM call. It is a state machine. Steps split the work into named, discrete units. Each step has its own instructions, scoped tool set, and output schema. Conditional routing between steps is the agent's logic.

{
"first_step": { "name": "classify", "next_steps": [ /* conditions */ ] },
"steps": [ /* ... */ ],
"reentry_step": "classify"
}

The LLM decides what to call within a step. The platform decides which step runs next. Transitions evaluate against structured output, not prose — the LLM cannot talk the router into a different branch.

For the conceptual chapter on stepped agents, conditional routing, sub-agents, structured outputs, and cross-session approvals, see Workflows. For the configuration reference, see Steps and Structured outputs.

Skills

A skill is an instruction block the LLM loads on demand. Instead of stuffing every policy, edge case, and persona into a monolithic system prompt, the LLM sees a one-line description of each skill and calls invoke_skill(name) only when it needs that domain. The full instruction body enters context only then.

That keeps the always-on system prompt small and lets you add a new domain without re-tuning the whole prompt.

For the configuration model and patterns, see Skills.

Tools

A tool is a small, named, schema-typed function the LLM can call. The agent's tool_configurations map lists what is available. Three sources fill that map: built-in catalog (35+ tools, no code), Lambda tools (your Python, sandboxed), and MCP servers (any Model Context Protocol server you register).

"tool_configurations": {
"kb_search": { "type": "corpora_search", "query_configuration": { /* ... */ } },
"score_lead": { "type": "lambda", "tool_id": "<id_returned_by_POST_/v2/tools>" },
"github": { "type": "mcp", "server_url": "https://mcp.example.com/github" }
}

For the conceptual chapter on tools, MCP, web_get, and inbound connectors (Slack and more), see Tools & connectors. For the configuration references, see Agent tools, Lambda tools, Model Context Protocol, and web_get tool.

Corpus

A corpus is the agent's persistent context layer: writable, filterable, multilingual, semantic. Anything the agent needs to know, remember, or recall goes into a corpus — same retrieval engine, same RBAC, same citations across three different jobs:

  • Knowledge — pre-loaded documents (manuals, KB, policies).
  • Memory — what the agent learns about a user.
  • Scratchpad — cached tool results and derived facts.

For the RAG/knowledge side, see Knowledge. For the memory and scratchpad sides, see Context & memory (conceptual framing) and Memory (the canonical patterns guide).

Sub-agents

A sub-agent is a full Vectara agent wrapped as a single tool. The parent calls it like any other tool. The called agent runs in its own session, with its own steps, tools, model, and secrets, and returns its result. The parent continues.

"approvals": {
"type": "sub_agent",
"agent_key": "manager_approval_agent"
}

The wins — composition without coupling, reuse across orchestrators, bounded scope, and async-friendly cross-session approvals — are covered in Workflows § sub-agents. For the configuration reference, see Sub-agents.

Sessions

A session is a long-lived container for one user's conversation: every message, every tool call, every result, plus your own session.metadata (anything the agent reads via $session.metadata.*) and per-session session.secrets.

POST /v2/agents/{agent_key}/sessions
{
"session": {
"metadata": { "user_id": "alice.chen", "tier": "enterprise" },
"secrets": { "gdrive_token": "ya29.A0AfH6..." }
}
}

session.metadata is how a single agent definition serves every tenant — tools and corpora $ref metadata, and the same agent serves every user. Artifacts handle large tool outputs and step-to-step handoffs (see Artifacts).

For the full reference, see Sessions.

Context engineering

Two platform features keep long-running sessions inside the model's context window without you writing window-management code.

  • Compaction summarizes older turns and hides them from the LLM when usage crosses a threshold. The agent gets a built-in search_session_history tool to recall facts from the hidden range.
  • Tool-output offloading parks oversized tool outputs in a session artifact (or truncates them) before they dwarf the prompt. On by default for every agent.

For the full reference, see Compaction and Tool-output offloading.

Secrets

Credentials live by reference. The LLM literally cannot see them. Three scopes cover every credential pattern:

  • agent.secrets — long-lived, agent-owned, encrypted at rest. Use for service-account tokens.
  • session.secrets — per session, user-specific, also encrypted. Use for the end user's OAuth access token.
  • $ref — symbolic references in tool arguments. The platform resolves the literal value at call time, just before the tool executes. The LLM sees only the symbol, never the value.
"tool_configurations": {
"sf_lead": {
"type": "web_get",
"argument_override": {
"headers": { "Authorization": { "$ref": "agent.secrets.sf_bearer" } }
}
}
}

Paired with argument_override, which sets specific tool parameters at config time and hides them from the LLM entirely. A prompt-injection attack has nothing to exfiltrate and nowhere to escalate.

For the full reference, see Agent secrets. For the three authentication patterns (per-user OAuth, per-user bearer, service-account OAuth) with copy-paste configs, see Connector authentication.

API surface

No SDK lock-in, no proprietary protocol. Your app makes plain HTTP calls, the agent streams back typed Server-Sent Events.

MethodPathPurpose
POST/v2/agentsCreate or update an agent (config JSON).
POST/v2/agents/{agent_key}/sessionsStart a session.
POST/v2/agents/{agent_key}/sessions/{session_key}/eventsSend a user message. Response is a stream of typed events.
GET/v2/agents/{agent_key}/sessions/{session_key}Fetch full session history.
POST/v2/toolsRegister a Lambda tool or MCP server.
POST/v2/corporaCreate a corpus.

For the event protocol — every event type, the request body shape, and how to render each one — see Request lifecycle. For the full schema, see the REST API reference.

Putting it together: a refund-handling agent

The parts above all show up in one config. End user types: "I want to return order #A-12345 — the item arrived damaged."

{
"name": "refund_support_agent",
"model": { "name": "claude-sonnet-4-6" },
"tool_configurations": {
"kb_search": {
"type": "corpora_search",
"query_configuration": { "corpus_key": "support-kb" }
},
"order_lookup": {
"type": "web_get",
"argument_override": {
"url": "https://orders.internal/v1/orders",
"headers": { "Authorization": { "$ref": "agent.secrets.orders_api" } }
}
},
"refund_amount": {
"type": "lambda",
"tool_id": "<id_returned_by_POST_/v2/tools>"
},
"approval": {
"type": "sub_agent",
"agent_key": "manager_approval_agent"
}
},
"first_step": {
"name": "classify",
"allowed_tools": ["kb_search"],
"output_parser": {
"type": "json",
"json_schema": { "required": ["intent", "order_id", "reason"] }
},
"next_steps": [{ "step_name": "lookup" }]
},
"steps": [
{
"name": "lookup",
"allowed_tools": ["order_lookup", "refund_amount", "kb_search"],
"next_steps": [
{ "condition": "get('$.amount') > 500", "step_name": "manager_gate" },
{ "step_name": "resolve" }
]
},
{
"name": "manager_gate",
"allowed_tools": ["approval"],
"next_steps": [{ "step_name": "resolve" }]
},
{
"name": "resolve",
"output_parser": { "type": "search" }
}
]
}

What runs:

  1. classify — the LLM parses the message into a JSON spec (intent, order_id, reason). kb_search is available for clarification only.
  2. lookuporder_lookup calls the internal Orders API. refund_amount computes the eligible amount with pure-Python rules. kb_search pulls the relevant policy passage.
  3. manager_gate — refunds over $500 route through the approval sub-agent, which can pause cross-session for a human approver.
  4. resolve — generates the user-facing answer with policy citations.

The application streams events as they arrive. The user sees the policy reference, the decision, and an "awaiting approval" indicator when the manager gate is engaged.

Why this shape

PropertyWhat it means
DeclarativeEvery piece of the agent is configuration. What is not in the JSON does not happen.
ComposableTools, skills, sub-agents, steps — primitives that snap together.
Safe by constructionPer-step tool scope, locked arguments, eager-resolved secrets, structured outputs.
API-firstPlain REST and a typed event stream. Your app calls in and renders the result.