Skip to main content
Version: 2.0

Agent

Agents are the core orchestration unit in the Vectara platform. The agent decides how to respond to user input, when to invoke tools, and how to manage conversation state.

Each agent is configured with:

  • A unique key and name following the pattern agt_[identifier]. If you do not provide a key, Vectara generates one automatically based on the name.
  • A human-readable description
  • Optional instructions
  • A list of available tools (referenced by name or ID)
  • Optional tool configurations, for example Corpora Search tools configured to grant access to various corpora
  • Metadata and versioning controls
  • A first_step definition that encompasses optional instructions for the agent's behavior.

Agents operate through a conversational step architecture, processing user input through reasoning, tool execution, and response generation phases. The step-based design enables complex multi-turn workflows and intelligent tool orchestration.

Create an agent

You can create an agent in the Vectara Console, or you can use the API. For more information, check out our Agents Quick Start.

Example agent definition

This example shows a basic customer support agent configured with corpus search capabilities and inline instructions. The agent demonstrates the core components: tool configurations for searching support tickets, and a conversational first step with behavior guidelines.

AGENT EXAMPLE
1

Model configuration

Agents use large language models for reasoning and response generation. You can configure:

  • Model: Choose from available models like GPT-4o.
  • Parameters: Adjust temperature, max tokens, and other model-specific settings
  • Cost optimization: Balance performance with token usage
  • Retry configuration: Configure automatic retry behavior for transient failures

Retry configuration

When agents interact with LLMs, transient failures may occur that interrupt the conversation flow, including network timeouts, temporary server issues, or reaching API rate limits. Without a retry mechanism, these temporary issues cause your agent to fail immediately, resulting in a poor user experience.

Vectara provides a retry configuration option for agents which detects these recoverable failures and retries the request with exponential backoff automatically.

The RetryConfiguration object controls the retry behavior for your agent's interactions with the LLM. You define these settings when creating or updating your agent model, and they apply to all LLM requests made by that agent.

Retry configuration parameters

  • enabled: The boolean flag to enable or disable retry logic
    • Default: true
  • max_retries: The maximum number of retry attempts after the initial failure
    • Range: 0-10
    • Default: 3
  • initial_backoff_ms: The initial delay in milliseconds before the first retry
    • Range: 100-60000ms
    • Default: 1000ms
  • max_backoff_ms: The maximum delay in milliseconds between retries
    • Range: 1000-300000ms
    • Default: 30000ms
  • backoff_factor: The exponential multiplier for calculating backoff delays
    • Range: 1.0-10.0
    • Default: 2.0

Exponential backoff

Exponential backoff progressively increases the delay between retry attempts to avoid overwhelming a recovering service. For example, with default settings (initial: 1000ms, factor: 2.0, max: 30000ms):

  • Attempt 1: 1000ms delay
  • Attempt 2: 2000ms delay
  • Attempt 3: 4000ms delay
  • Attempt 4: 8000ms delay

The delay continues to grow exponentially until it reaches the max_backoff_ms value, at which point it remains constant for any remaining retry attempts.

Here's how to create a research assistant agent that can search the web for current information. This agent completes the following tasks:

  • Search the web when users ask questions requiring current information
  • Limit search results to 20 for comprehensive responses
  • Use a lower temperature (0.3) for more consistent, factual responses
  • Follow instructions to cite sources and admit uncertainty when appropriate
  • Configure retry logic to handle transient API failures gracefully

This example requires no corpus setup, making it perfect for immediate testing.

CREATE A RESEARCH ASSISTANT AGENT
1?

Chat with your agent

After creating an agent, you can interact with it by creating a session and sending messages:

1. Create a session

Sessions provide conversation context and are required for all agent interactions:

CREATE A SESSION
1

2. Send messages to the agent

Once you have a session, send messages using the events endpoint:

SEND A MESSAGE
1

The agent will respond with events including its reasoning, tool usage, and final response.

Quick Start

For a complete step-by-step guide with code examples, see Agent Quick Start.