Creates a model response for the given chat conversation
POST/v2/llms/chat/completions
The Chat Completions API provides an OpenAI-compatible interface for generating model responses in multi-turn chat conversations. This API enables you to integrate our language models directly into applications designed to work with the OpenAI Chat Completions format, making it easy to leverage Vectara capabilities with minimal changes to existing tools or code.
Use this API to enable interactive chat experiences that support context-aware responses, streaming output, and token usage tracking.
The request includes a series of chat messages and optional parameters that control the behavior and structure of the model response. The request body must include the messages parameter, an array of message objects (role, content) representing the full conversation so far.
Streaming responses
If the stream parameter is set to true, the response appears as a series of text/event-stream parts (also known as chunks). Each chunk includes a delta field showing the incremental message update.
Example request
This example sends a simple chat conversation to the API, asking the assistant for the capital of France. The request includes a system prompt, a user message, and a temperature setting for response variability.
{
"model": "chat-model-001","messages": [{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is the capital of France?" }
],
"temperature": 0.7,
"stream": false
}
Example response
The response includes a generated reply from the assistant, along with token usage statistics. In this example, the model returns a direct answer to a user question.
{
"id": "chatcmpl-abc123",}
"object": "chat.completion",
"created": 1712454830,
"model": "chat-model-001",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 21,
"completion_tokens": 9,
"total_tokens": 30
}
}
If the input summary is accurate, the corrected_summary matches the original_summary.
Request
Responses
- 200
- 400
- 403
A chat completion
Chat completion request was malformed.
Permissions do not allow creating a chat completion.