Skip to main content
Version: 2.0

Bring Your Own LLM

Organizations often need to integrate multiple Large Language Models (LLMs) from different providers to optimize cost, performance, or compliance. Vectara’s Bring Your Own LLM (BYO-LLM) capability allows seamless integration of third-party LLMs—including OpenAI, Anthropic, Google, and other OpenAI-compatible models—into Vectara's AI stack.

By configuring LLMs with the Create LLM API, you can enhance flexibility in how Vectara generates summaries, answers, and content, leveraging your preferred LLM infrastructure while retaining full compatibility with Vectara's powerful RAG workflows.

Define a custom LLM configuration

The integration relies on defining a custom LLM configuration with the Create LLM endpoint. The following table provides the custom LLM configuration fields:

FieldDescription
typeSet to openai-compatible to indicate compatibility with OpenAI-style APIs.
nameUser-defined label for the LLM configuration.
description(Optional) Metadata or notes about the model.
modelSpecific model version (gpt-4, claude-3-7-sonnet-20250219)
uriThe API endpoint used to send LLM requests. For example, https://api.anthropic.com/v1/chat/completions
authAuthentication object (bearer token or custom header auth).
test_model_parameters(Optional) Test parameters to validate the configuration (max_tokens)

Custom LLM examples

Add Anthropic Claude 3.7 Sonnet

Request Body

{
"type": "openai-compatible",
"name": "Claude 3.7 Sonnet",
"description": "Anthropic'\''s Claude 3.7 Sonnet model",
"model": "claude-3-7-sonnet-20250219",
"uri": "https://api.anthropic.com/v1/chat/completions",
"auth": {
"type": "header",
"header": "x-api-key",
"value": "sk-ant-......"
},
"test_model_parameters": {
"max_tokens": 256
}
}

cURL Example

curl -L -X POST 'https://api.vectara.io/v2/llms' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'x-api-key: YOUR-VECTARA-API-KEY' \
--data-raw '{
"type": "openai-compatible",
"name": "Claude 3.7 Sonnet",
"description": "Anthropic'\''s Claude 3.7 Sonnet model",
"model": "claude-3-7-sonnet-20250219",
"uri": "https://api.anthropic.com/v1/chat/completions",
"auth": {
"type": "header",
"header": "x-api-key",
"value": "sk-ant-..."
},
"test_model_parameters": {
"max_tokens": 256
}
}'

Successful Response

{
"id": "llm_520721844",
"name": "Claude 3.7 Sonnet",
"description": "Anthropic's Claude 3.7 Sonnet model",
"enabled": true
}

Add OpenAI GPT-4

curl -L -X POST 'https://api.vectara.io/v2/llms' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'x-api-key: zut_' \
--data-raw '{
"type": "openai-compatible",
"name": "GPT-4o Mini",
"description": "OpenAI'\''s 4o mini",
"model": "gpt-4o-mini",
"uri": "https://api.openai.com/v1/chat/completions",
"auth": {
"type": "header",
"header": "Authorization",
"value": "Bearer sk-..."
},
"test_model_parameters": {
"max_tokens": 256
}
}'

Add Google Gemini 2.0 Flash

curl -L -X POST 'https://api.vectara.io/v2/llms' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'x-api-key: zut_...' \
--data-raw '{
"type": "openai-compatible",
"name": "Google Gemini 2.0 Flash",
"description": "Google 2.0 Flash",
"model": "gemini-2.0-flash",
"uri": "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions",
"auth": {
"type": "header",
"header": "Authorization",
"value": "Bearer AI..."
},
"test_model_parameters": {
"max_tokens": 256
}
}'

Verify your configuration

To confirm your model was added successfully:

curl -L -X GET 'https://api.vectara.io/v2/llms' \
-H 'Accept: application/json' \
-H 'x-api-key: YOUR-VECTARA-API-KEY'

Look for your model in the response JSON and verify it has "enabled": true.

Querying with a custom LLM

After you register a third-party LLM using the /v2/llms endpoint, you do not reference it directly by ID in your query. Instead, you associate the custom LLM with a generation preset, and then use that preset in your query with generation_preset_name.

However, when defining custom model_parameters, you can override the default preset and explicitly specify the registered model by name.

Example query

curl -X POST 'https://api.vectara.io/v2/query' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'x-api-key: zut_...' \
-d '{
"query": "Here is my query",
"search": {
"corpora": [
{
"corpus_key": "corpus_1",
"lexical_interpolation": 0.025
}
],
"limit": 10,
"offset": 0
},
"generation": {
"generation_preset_name": "custom-llm",
"prompt_template": "[ {\"role\": \"user\", \"content\": \"custom instructions\"} ]",
"model_parameters": {
"llm_name": "claude-3-7-sonnet"
}
}
}'