Use OpenAI SDK with the Vectara Chat Completions API
This tutorial demonstrates how to use Vectara's Chat Completions API through OpenAI-compatible interfaces. Learn how to integrate Vectara's generative AI capabilities into your applications using either direct HTTP requests or the OpenAI Python SDK. This enables seamless migration from OpenAI or integration with OpenAI-compatible tools. By completing this tutorial, you will use Vectara's API directly or via OpenAI SDK.
This tutorial contains the following steps:
- Prerequisites and setup
- Step 1. Install the required packages
- Step 2. Implement the VectaraChat client
- Step 3. Enter your API key
- Step 4. Initialize the Vectara chat client
- Step 5. Perform tests
We recommend that you complete this tutorial in Google Colab.
Prerequisites and setup
- Python 3.8 or higher
- Basic understanding of REST APIs and HTTP requests
- A valid Vectara API key with access to the Chat Completions endpoint.
Step 1. Install the required packages
Install the required Python packages. The requests library handles direct HTTP
calls, while openai provides the official OpenAI SDK for simplified
integration.
INSTALL REQUIRED PACKAGES
Code example with bash syntax.1
Step 2. Implement the VectaraChat client
The following code contains the implementation of the VectaraChat client, which provides methods for interacting with Vectara's Chat Completions API.
VECTARACHAT CLIENT IMPLEMENTATION
Code example with python syntax.1
Enable verbose=True during development to see detailed request/response
logging for debugging.
Step 3. Enter your API key
API KEY CONFIGURATION
Code example with python syntax.1
Step 4: Initialize the Vectara chat client
Create the VectaraChat instance and choose between Bearer token authentication
(recommended) or x-api-key header authentication.
INITIALIZE CLIENT
Code example with python syntax.1
Step 5. Perform tests
Now that you've set up the VectaraChat client and initialized it with your API key, let's test both implementation approaches. The following tests demonstrate four different scenarios: direct HTTP requests (streaming and non-streaming) and OpenAI SDK integration (streaming and non-streaming). Each test shows you how to make requests and handle responses in different ways.
Test 1: Direct API (non-streaming)
Let's test the direct API approach without streaming:
Direct HTTP Request
NON-STREAMING DIRECT API CALL
Code example with python syntax.1
NON-STREAMING RESPONSE EXAMPLE
Code example with json syntax.1
OpenAI SDK Request
NON-STREAMING WITH OPENAI SDK
Code example with python syntax.1
Test 2: Direct API (streaming)
Now let's test with streaming enabled:
STREAMING DIRECT API CALL
Code example with python syntax.1
STREAMING OUTPUT EXAMPLE
Code example with text syntax.1
Test 3: OpenAI SDK (non-streaming)
Now let's test using the OpenAI SDK without streaming:
OPENAI SDK NON-STREAMING CALL
Code example with python syntax.1
OPENAI SDK OUTPUT EXAMPLE
Code example with text syntax.1
Test 4: OpenAI SDK (streaming)
Finally, let's test the OpenAI SDK with streaming:
OPENAI SDK STREAMING CALL
Code example with python syntax.1
STREAMING OUTPUT EXAMPLE
Code example with text syntax.1
Advanced usage examples
Beyond the basic tests, explore these advanced usage patterns to build production-ready applications:
- Multi-turn conversations - Maintain context across multiple exchanges.
- Use different models - Switch between available LLM models.
- Customize generation parameters - Control output with temperature and token limits.
Multi-turn conversations
The previous tests showed single-question interactions. Real conversational applications need to maintain context across multiple exchanges. The Chat Completions API supports multi-turn conversations by including the conversation history in each request. Here's how to build a contextual conversation:
MULTI-TURN CONVERSATION EXAMPLE
Code example with python syntax.1
MULTI-TURN CONVERSATION OUTPUT
Code example with text syntax.1
Use different models
Vectara supports various LLM models. Let's try a different model:
USING DIFFERENT MODELS
Code example with python syntax.1
DIFFERENT MODEL OUTPUT
Code example with text syntax.1
Customize generation parameters
You can customize generation parameters to control the output:
CUSTOMIZING PARAMETERS
Code example with python syntax.1
CUSTOMIZED OUTPUT
Code example with text syntax.1
This tutorial demonstrated how to use the Vectara Chat Completions API, both directly and with the OpenAI SDK. You can use this API to add powerful generative AI capabilities to your applications with OpenAI-compatible interfaces.
For integration examples with external applications, see Use with External Applications.