Model Selection
Selecting the right model for your application helps you meet specific use case requirements. Vectara provides flexible model selection capabilities that support both Vectara-provided presets and the ability to add your own models.
Generation presets
Generation presets provide curated model configurations optimized for specific use cases and simplify model selection. We recommend the following generation presets:
mockingbird-2.0
- Vectara's RAG-optimized LLM with superior citation accuracyvectara-summary-ext-24-05-med-omni
- GPT-4o for enhanced citationsvectara-summary-table-query-ext-dec-2024-gpt-4o
- Optimized for table data
Bring Your Own LLM
Organizations can integrate third-party LLMs using Vectara's Bring Your Own LLM capability.
Custom LLMs cannot be used directly in queries. They must be referenced through generation presets.
- Register your LLM using the Create LLM API endpoint (
POST /v2/llms
)- Supports OpenAI-compatible APIs (including Anthropic Claude)
- Supports Vertex AI (Google Gemini models)
- Supports OpenAI Responses API (reasoning models like o1, o3)
- Create or use a generation preset that references your LLM by name
- Use the preset in queries with
generation_preset_name
Override option: You can override a preset's LLM using model_parameters.llm_name
:
1
Supported model types
Vectara native models
Mockingbird LLMs
- Specifically designed for Retrieval Augmented Generation
- Superior citation accuracy compared to general-purpose models
- Excellent multilingual performance
- Optimized for structured data generation
OpenAI models
Available via OpenAI-compatible API:
- GPT-4, GPT-4-turbo
- GPT-3.5-turbo for cost-effective applications
- Custom fine-tuned models via OpenAI interface
Vertex AI Models
Google Cloud integration:
- Gemini 2.5-flash (cost-effective, fast)
- Gemini 2.5-pro (high performance)
- Gemini 2.0-experimental (latest features)
Claude Models
Anthropic integration:
- Available via OpenAI-compatible interface
- Various Claude model variants supported
Use Case Recommendations
RAG applications
Recommended model: mockingbird-2.0
- Why: Designed specifically for RAG use cases
- Benefits: Enhanced citation accuracy, better context understanding
- Best For: Enterprise applications requiring high-quality summaries with reliable source attribution
General summarization
Recommended models: GPT-4 variants
- Why: Versatile performance across different content types
- Benefits: Strong reasoning capabilities, broad knowledge
- Best For: Applications requiring creative or analytical summaries
Cost-effective solutions
Recommended model: GPT-3.5-turbo or Gemini Flash
- Why: Lower cost per token while maintaining good quality
- Benefits: Faster response times, reduced operational costs
- Best For: High-volume applications with simpler summarization needs
Multilingual applications
Recommended model: mockingbird-2.0
- Why: Excellent multilingual performance
- Benefits: Consistent quality across languages
- Best For: Global applications serving diverse language communities
Technical documentation
Recommended model: GPT-4 with structured prompts
- Why: Strong performance on technical content
- Benefits: Better handling of code, APIs, and technical concepts
- Best For: Developer documentation and technical knowledge bases
Advanced model configuration options
Nuanced control
For applications requiring precise control over model behavior:
1
Parameter recommendations
- temperature: 0.0-0.3 for factual content, 0.4-0.7 for creative content
- max_tokens: Set based on desired response length (typically 200-2000)
- frequency_penalty: 0.1-0.5 to reduce repetition
- presence_penalty: 0.1-0.3 to encourage topic diversity
Custom prompt templates
Combine model selection with custom prompt templates for specialized applications:
- Legal document analysis
- Financial report summarization
- Scientific literature review
- Customer support responses
Data-driven model selection
Rather than guessing which model works best for your use case, use the Vectara Open Eval Framework to systematically evaluate and optimize your model selection:
Systematic evaluation process
- Create evaluation datasets representative of your use case
- Test multiple model configurations with different presets and parameters
- Measure performance using standardized metrics (UMBRELA, BERT Score, etc.)
- Compare results to identify optimal configurations