Agent
Agents are the core orchestration unit in the Vectara platform. The agent decides how to respond to user input, when to invoke tools, and how to manage conversation state.
Each agent is configured with:
- A unique key and name following the pattern agt_[identifier]. If you do not provide a key, Vectara generates one automatically based on the name.
- A human-readable description
- Optional instructions
- A list of available tools (referenced by name or ID)
- Optional tool configurations, for example Corpora Search tools configured to grant access to various corpora
- Metadata and versioning controls
- A first_step definition that encompasses optional instructions for the agent's behavior.
Agents operate through a conversational step architecture, processing user input through reasoning, tool execution, and response generation phases. The step-based design enables complex multi-turn workflows and intelligent tool orchestration.
You can create an agent in the Vectara Console, or you can use the API. For more information, check out our Agents Quick Start.
Example agent definition
This example shows a basic customer support agent configured with corpus search capabilities and inline instructions. The agent demonstrates the core components: tool configurations for searching support tickets, and a conversational first step with behavior guidelines.
1
Model configuration
Agents use large language models for reasoning and response generation. You can configure:
- Model: Choose from available models like GPT-4o.
- Parameters: Adjust temperature, max tokens, and other model-specific settings
- Cost optimization: Balance performance with token usage
- Retry configuration: Configure automatic retry behavior for transient failures
Retry configuration
When agents interact with LLMs, transient failures may occur that interrupt the conversation flow, including network timeouts, temporary server issues, or reaching API rate limits. Without a retry mechanism, these temporary issues cause your agent to fail immediately, resulting in a poor user experience.
Vectara provides a retry configuration option for agents which detects these recoverable failures and retries the request with exponential backoff automatically.
The RetryConfiguration
object controls the retry behavior for your agent's
interactions with the LLM. You define these settings when creating or
updating your agent model, and they apply to all LLM requests made by that
agent.
Retry configuration parameters
- enabled: The boolean flag to enable or disable retry logic
- Default:
true
- Default:
- max_retries: The maximum number of retry attempts after the initial failure
- Range: 0-10
- Default:
3
- initial_backoff_ms: The initial delay in milliseconds before the first retry
- Range: 100-60000ms
- Default:
1000ms
- max_backoff_ms: The maximum delay in milliseconds between retries
- Range: 1000-300000ms
- Default:
30000ms
- backoff_factor: The exponential multiplier for calculating backoff delays
- Range: 1.0-10.0
- Default:
2.0
Exponential backoff
Exponential backoff progressively increases the delay between retry attempts to avoid overwhelming a recovering service. For example, with default settings (initial: 1000ms, factor: 2.0, max: 30000ms):
- Attempt 1: 1000ms delay
- Attempt 2: 2000ms delay
- Attempt 3: 4000ms delay
- Attempt 4: 8000ms delay
The delay continues to grow exponentially until it reaches the
max_backoff_ms
value, at which point it remains constant for any remaining
retry attempts.
Example: Research assistant with web search
Here's how to create a research assistant agent that can search the web for current information. This agent completes the following tasks:
- Search the web when users ask questions requiring current information
- Limit search results to 20 for comprehensive responses
- Use a lower temperature (0.3) for more consistent, factual responses
- Follow instructions to cite sources and admit uncertainty when appropriate
- Configure retry logic to handle transient API failures gracefully
This example requires no corpus setup, making it perfect for immediate testing.
1?
Chat with your agent
After creating an agent, you can interact with it by creating a session and sending messages:
1. Create a session
Sessions provide conversation context and are required for all agent interactions:
1
2. Send messages to the agent
Once you have a session, send messages using the events endpoint:
1
The agent will respond with events including its reasoning, tool usage, and final response.
For a complete step-by-step guide with code examples, see Agent Quick Start.