Skip to main content
Version: 2.0

Creates a model response for the given chat conversation

POST 

/v2/llms/chat/completions

Supported API Key Type:
Query ServicePersonal

The Chat Completions API provides an OpenAI-compatible interface for generating model responses in multi-turn chat conversations. This API enables you to integrate our language models directly into applications designed to work with the OpenAI Chat Completions format, making it easy to leverage Vectara capabilities with minimal changes to existing tools or code.

Use this API to enable interactive chat experiences that support context-aware responses, streaming output, and token usage tracking.

The request includes a series of chat messages and optional parameters that control the behavior and structure of the model response. The request body must include the messages parameter, an array of message objects (role, content) representing the full conversation so far.

Streaming responses

If the stream parameter is set to true, the response appears as a series of text/event-stream parts (also known as chunks). Each chunk includes a delta field showing the incremental message update.

Example request

This example sends a simple chat conversation to the API, asking the assistant for the capital of France. The request includes a system prompt, a user message, and a temperature setting for response variability.

{
"model": "chat-model-001","messages": [{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is the capital of France?" }
],
"temperature": 0.7,
"stream": false
}

Example response

The response includes a generated reply from the assistant, along with token usage statistics. In this example, the model returns a direct answer to a user question.

{
"id": "chatcmpl-abc123",}
"object": "chat.completion",
"created": 1712454830,
"model": "chat-model-001",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 21,
"completion_tokens": 9,
"total_tokens": 30
}
}

If the input summary is accurate, the corrected_summary matches the original_summary.

Request

Responses

A chat completion