Version: 2.0

Agent anatomy

An agent groups together a set of parts that each do one job. This page is the conceptual map, what each part is and how the JSON fits together. For the full configuration reference of any one part, follow the link at the end of its section.

The whole anatomy is declarative. Runtime behavior is driven by the declared agent configuration: model, steps, tools, skills, corpora, routing, sessions, and secrets. This makes the agent easier to inspect, test, and audit before production.

tip

Anatomy is the what; tuning is the how.

The companion guide Context engineering is the tuning playbook, which levers to pull, in what order, for the best results from a production agent.

The mental model

The application sits between the user and the agent. Vectara is headless: the end-user UI lives in the application layer, not in the platform itself.

Layer	Role
End user	Sees only the branded UI. Asks questions, uploads files, approves actions through controls the application exposes.
The application (built by your team in hours, or delivered turnkey by Vectara Managed Agents)	Holds user identity, calls Vectara REST API, streams responses, renders citations and approvals. A thin layer, since the platform does the AI work.
Vectara agent + platform	Runs the declared agent over a session. Calls tools, queries corpora, generates with the chosen LLM, streams events back.

The end user has no concept of an agent. They see your product, whether your team built it or Vectara delivered it.

A tour of the parts

Part	Role	Canonical reference
Identity and model	`name`, `description`, `model`, naming the agent and the LLM that powers it	Agent concepts
Steps and workflow	The state machine: named steps, conditional routing, per-step scope	Workflows, Steps
Skills	Expertise the agent loads only when it needs it	Skills
Tools	What the agent can do: 35+ built-in, Python Lambdas, MCP servers	Tools and connectors, Agent tools
Corpus	Persistent context: knowledge, memory, scratchpad in one primitive	Knowledge, Context and memory
Sub-agents	Other agents called tools	Workflows > sub-agents, Sub-agents reference
Sessions	Where one user's history, metadata, artifacts, and per-user secrets live	Sessions
Context engineering	Compaction and tool-output offloading keep long sessions inside the model's context window	Compaction, Tool-output offloading
Secrets	Credentials by symbolic reference. The LLM never sees the literal value	Agent secrets, Connector authentication
API surface	REST APIs with a streaming event protocol for runtime updates	REST API reference, Request lifecycle

Each section below is a one-paragraph tour. Follow the canonical reference for the full configuration model.

Identity and model

Three top-level fields name the agent and pick the LLM.

{
  "name": "support_agent",
  "description": "L1 support across the KB and ticketing system.",
  "model": { "name": "claude-sonnet-4-6" }
}

The model field accepts any provider in the LLM gateway: hosted models from Anthropic, OpenAI, and Gemini, on-prem industry-specialized models, or your own deployment via BYO LLM. Swap the model per agent, or per step, without touching the rest.

For the full reference, see Agent concepts.

Steps and workflow

An agent is not one LLM call. It is a state machine. Steps split the work into named, discrete units. Each step has its own instructions, scoped tool set, and output schema. Conditional routing between steps is the agent's logic.

{
  "first_step": { "name": "classify", "next_steps": [ /* conditions */ ] },
  "steps": [ /* ... */ ],
  "reentry_step": "classify"
}

The LLM decides what to call within a step. The platform decides which step runs next. Routing evaluates structured output against declared conditions rather than relying on prose instructions in the prompt.

For the conceptual chapter on stepped agents, conditional routing, sub-agents, structured outputs, and cross-session approvals, see Workflows. For the configuration reference, see Steps and Structured outputs.

Skills

A skill is an instruction block the LLM loads on demand. Instead of stuffing every policy, edge case, and persona into a monolithic system prompt, the LLM sees a one-line description of each skill and calls invoke_skill(name) only when it needs that domain. The full instruction body enters context only then.

That keeps the always-on system prompt small and lets you add a new domain without re-tuning the whole prompt.

For the configuration model and patterns, see Skills.

Tools

A tool is a small, named, schema-typed function the LLM can call. The agent's tool_configurations map lists what is available. Three sources fill that map: built-in catalog (35+ tools, no code), Lambda tools (your Python, sandboxed), and MCP servers (any Model Context Protocol server you register).

"tool_configurations": {
  "kb_search":  { "type": "corpora_search", "query_configuration": { /* ... */ } },
  "score_lead": { "type": "lambda",         "tool_id": "<id_returned_by_POST_/v2/tools>" },
  "github":     { "type": "mcp",            "server_url": "https://mcp.example.com/github" }
}

For the conceptual chapter on tools, MCP, web_get, and inbound connectors (Slack and more), see Tools & connectors. For the configuration references, see Agent tools, Lambda tools, Model Context Protocol, and web_get tool.

Corpus

A corpus is the agent's persistent context layer: writable, filterable, multilingual, semantic. The same corpus primitive serves three jobs (knowledge, memory, and scratchpad) on one retrieval engine, with the same RBAC and citations.

For the knowledge side, see Knowledge. For why one primitive serves all three roles, see Context & memory (conceptual framing) and Memory (the canonical patterns guide).

Sub-agents

A sub-agent is a full Vectara agent wrapped as a single tool. The parent calls it like any other tool. The called agent runs in its own session, with its own steps, tools, model, and secrets, and returns its result. The parent continues.

"approvals": {
  "type": "sub_agent",
  "agent_key": "manager_approval_agent"
}

The wins (composition without coupling, reuse across orchestrators, bounded scope, and async-friendly cross-session approvals) are covered in Workflows § sub-agents. For the configuration reference, see Sub-agents.

Sessions

A session is a long-lived container for one user's conversation: every message, every tool call, every result, plus your own session.metadata (anything the agent reads via $session.metadata.*) and per-session session.secrets.

POST /v2/agents/{agent_key}/sessions
{
  "session": {
    "metadata": { "user_id": "alice.chen", "tier": "enterprise" },
    "secrets":  { "gdrive_token": "ya29.A0AfH6..." }
  }
}

session.metadata is how a single agent definition serves every tenant: tools and corpora $ref metadata, and the same agent serves every user. Artifacts handle large tool outputs and step-to-step handoffs (see Artifacts).

For the full reference, see Sessions.

Context engineering

Two platform features keep long-running sessions inside the model's context window without you writing window-management code.

Compaction summarizes older turns and hides them from the LLM when usage crosses a threshold. The agent gets a built-in search_session_history tool to recall facts from the hidden range.
Tool-output offloading parks oversized tool outputs in a session artifact (or truncates them) before they dwarf the prompt. On by default for every agent.

For the full reference, see Compaction and Tool-output offloading.

Secrets

Credentials live by reference, not by value. An agent has two storage scopes: agent-wide agent.secrets and per-session session.secrets. Tools pull a credential into a request through a symbolic $ref that the platform resolves at call time, paired with argument_override so the value is set at config time. The model works with the symbol, not the literal value.

For the storage scopes, masking, and rotation, see Agent secrets. For the three authentication patterns (per-user OAuth, per-user bearer, service-account OAuth) with copy-paste configs, see Connector authentication.

API surface

No SDK lock-in, no proprietary protocol. Your app makes plain HTTP calls to create agents, start sessions, and send messages, and the agent streams back typed Server-Sent Events.

For what happens on a single turn and every event type the stream emits, see Request lifecycle. For endpoints, request bodies, and response schemas, see the REST API reference.

Putting it together: A refund-handling agent

The parts above all show up in one config. End user types: "I want to return order #A-12345, the item arrived damaged."

{
  "name": "refund_support_agent",
  "model": { "name": "claude-sonnet-4-6" },
  "tool_configurations": {
    "kb_search": {
      "type": "corpora_search",
      "query_configuration": { "corpus_key": "support-kb" }
    },
    "order_lookup": {
      "type": "web_get",
      "argument_override": {
        "url": "https://orders.internal/v1/orders",
        "headers": { "Authorization": { "$ref": "agent.secrets.orders_api" } }
      }
    },
    "refund_amount": {
      "type": "lambda",
      "tool_id": "<id_returned_by_POST_/v2/tools>"
    },
    "approval": {
      "type": "sub_agent",
      "agent_key": "manager_approval_agent"
    }
  },
  "first_step": {
    "name": "classify",
    "allowed_tools": ["kb_search"],
    "output_parser": {
      "type": "json",
      "json_schema": { "required": ["intent", "order_id", "reason"] }
    },
    "next_steps": [{ "step_name": "lookup" }]
  },
  "steps": [
    {
      "name": "lookup",
      "allowed_tools": ["order_lookup", "refund_amount", "kb_search"],
      "next_steps": [
        { "condition": "get('$.amount') > 500", "step_name": "manager_gate" },
        { "step_name": "resolve" }
      ]
    },
    {
      "name": "manager_gate",
      "allowed_tools": ["approval"],
      "next_steps": [{ "step_name": "resolve" }]
    },
    {
      "name": "resolve",
      "output_parser": { "type": "search" }
    }
  ]
}

What runs:

classify: the LLM parses the message into a JSON spec (intent, order_id, reason). kb_search is available for clarification only.
lookup: order_lookup calls the internal Orders API. refund_amount computes the eligible amount with pure-Python rules. kb_search pulls the relevant policy passage.
manager_gate: refunds over $500 route through the approval sub-agent, which can pause cross-session for a human approver.
resolve: generates the user-facing answer with policy citations.

The application streams events as they arrive. The user sees the policy reference, the decision, and an "awaiting approval" indicator when the manager gate is engaged.

Why this shape

Property	What it means
Declarative	Every piece of the agent is configuration. What is not in the JSON does not happen.
Composable	Tools, skills, sub-agents, steps: primitives that snap together.
Safe by construction	Per-step tool scope, locked arguments, eager-resolved secrets, structured outputs.
API-first	Plain REST and a typed event stream. Your app calls in and renders the result.

Agents quickstart — Build a working agent in the Console.
Agent concepts — The configuration model at the API level.
Context engineering — The tuning playbook companion to this page.
Request lifecycle — What happens when a user sends one message.
Knowledge — The RAG side of the corpus primitive.
Context and memory — The memory and scratchpad side of the corpus primitive.

The mental model​

A tour of the parts​

Identity and model​

Steps and workflow​

Skills​

Tools​

Corpus​

Sub-agents​

Sessions​

Context engineering​

Secrets​

API surface​

Putting it together: A refund-handling agent​

Why this shape​

Related​