Skip to main content
Version: 2.0

Bring Your Own LLM

Organizations often need to integrate multiple Large Language Models (LLMs) from different providers to optimize cost, performance, or compliance. Vectara's Bring Your Own LLM (BYO-LLM) capability enables seamless integration of third-party LLMs into Vectara's AI stack, supporting OpenAI-compatible models, resposes API for reasoning models, and Google Cloud Vertex AI.

By configuring LLMs with the Create LLM API, you can enhance flexibility in how Vectara generates summaries, answers, and content, leveraging your preferred LLM infrastructure while retaining full compatibility with Vectara's powerful RAG workflows.

For example, models like GPT-5, Claude Sonnet, and Opus excel at generating code and technical content as part of your text responses. In your applications, you could use advanced models to generate code within Vectara responses, while leveraging multimodal models' image generation capabilities through separate API calls.

Define a custom LLM configuration

The integration relies on defining a custom LLM configuration with the Create LLM endpoint. Vectara supports three LLM types:

Supported LLM Types

TypeDescriptionUse For
openai-compatibleOpenAI-style APIsOpenAI, Anthropic Claude, Azure OpenAI
openai-responsesOpenAI Responses APIReasoning models (o1, o3)
vertex-aiGoogle Cloud Vertex AIGemini models

After you enter the type, continue with the remaining configuration fields:

Configuration Fields

FieldDescription
typeOne of the following: openai-compatible, openai-responses, or vertex-ai
nameUser-defined label for the LLM (referenced in queries)
description(Optional) Metadata or notes about the model
modelSpecific model version (gpt-4, claude-3.5-sonnet, gemini-2.5-flash)
uriThe API endpoint URL
authAuthentication configuration (varies by type)
headers(Optional) Additional HTTP headers for the API
test_model_parameters(Optional) Test parameters to validate the configuration

Add custom LLM examples

Here are some examples for Anthropic, OpenAI, and Google LLMs.

Add Anthropic Claude 3.7 Sonnet

Request Body

REQUEST EXAMPLE
1

cURL Example

CLAUDE EXAMPLE
1

Successful Response

RESPONSE EXAMPLE
1

Add OpenAI GPT-4o

GPT-4O MINI EXAMPLE
1

Add Google Gemini (Vertex AI)

Using API Key Authentication

GOOGLE GEMINI EXAMPLE
1

Using Service Account Authentication

SERVICE ACCOUNT EXAMPLE
1

Add OpenAI Reasoning Models (o1, o3)

OPENAI O1 EXAMPLE
1

Verify your configuration

To confirm your model was added successfully:

VERIFICATION EXAMPLE
1

Look for your model in the response JSON and verify it has "enabled": true.

Querying with a custom LLM

After you register a third-party LLM using the /v2/llms endpoint, you do not reference it directly by ID in your query. Instead, you associate the custom LLM with a generation preset, and then use that preset in your query with generation_preset_name.

However, when defining custom model_parameters, you can override the default preset and explicitly specify the registered model by name.

Example query

QUERY CUSTOM LLM EXAMPLE
1