Skip to main content
Version: 2.0

Agent

Agents are autonomous systems that understand natural language and use tools and reasoning to accomplish tasks. To do this, they must make decisions:

  • Interpreting user input
  • Calling tools, and with what arguments
  • Managing conversation state

Internally, they use an LLM-driven orchestration pipeline to make these decisions.

LLM configuration​

Agents use LLMs for reasoning and response generation. You can configure the following:

  • Model: Choose from available models like GPT-4o.
  • Parameters: Adjust temperature, max tokens, and other model-specific settings.
  • Cost optimization: Balance performance with token usage.
  • Retry configuration: Configure automatic retry behavior for transient failures.

Using retries to improve user experience​

When agents interact with LLMs, transient failures like network interruptions can disrupt communication between the agent and the LLM. You can configure your agent to resume disrupted communication to ensure a smooth user experience.

  • max_retries: After an error, the agent will retry its request to the LLM this many times.
  • initial_backoff_ms: This is how many milliseconds the agent will wait before retrying, to give the cause of the error time to resolve.
  • backoff_factor: Every time the agent retries, it can multiply the last retry delay by this number, increasing the wait between retries. This is like giving a toddler a longer and longer timeout if it continues to misbehave.
  • max_backoff_ms: The maximum time you want the agent to wait between retries, so the backoff_factor doesn't create an unreasonably long delay for your users.

Here's how to create a research assistant agent that can search the web for current information. This agent completes the following tasks:

  • Search the web when users ask questions requiring current information
  • Limit search results to 20 for comprehensive responses
  • Use a lower temperature (0.3) for more consistent, factual responses
  • Follow instructions to cite sources and admit uncertainty when appropriate
  • Configure retry logic to handle transient API failures gracefully

This example requires no corpus setup, making it perfect for immediate testing.

CREATE A RESEARCH ASSISTANT AGENT

Code example with bash syntax.
1?

To chat with your agent, read on about Sessions.

Artifacts​

Artifacts are files stored in an agent session's workspace that provide a persistent, session-scoped storage mechanism. They enable agents and users to share files throughout a conversation without bloating the agent’s context with content from large files.

Before artifacts, file uploads were handled inline within session events. Artifacts solve these problems by separating file storage from file references. When you upload a file, Vectara stores it in the session workspace and returns a lightweight ArtifactReference containing only metadata. Agents use these references to access files without including the full content in every request.

How artifacts work​

Artifacts are created either in user uploads or tool generation, where agent tools can create new artifacts as outputs. For example, converting a document to markdown.

Each artifact receives a unique identifier following the pattern art_[a-z0-9_-]+.

How agents use artifacts​

After files are uploaded as artifacts, the agent can:

  • Use document conversion tools to extract content from PDFs, Word documents, or PowerPoint files.
  • Reference artifacts in analysis or question-answering workflows.
  • Pass artifacts to indexing tools to add content to corpora.
  • Create new artifacts as outputs of tool operations.

Artifacts remain available throughout the session lifecycle, enabling multi-step workflows without re-uploading files.