Generation Presets
The Vectara SDK's Generation Presets module empowers enterprise developers to enhance Retrieval Augmented Generation (RAG) for queries and chats using preconfigured LLM settings. Optimize your generative AI solutions with presets like Mockingbird 2.0 and vectara-summary-ext-24-05-med-omni—. This guide helps you apply these presets to address business needs like personalized customer support and data-driven insights. You will learn:
- How to apply presets to tailor query and chat responses
- Techniques for customizing presets with filters and parameters
- Strategies for handling async generation with error management
For more details on generation presets, see the Generation Presets documentation.
This guide assumes you have a corpus called my-docs
. If you haven't created a corpus yet, follow
the Quick Start guide to set up your first corpus.
Using Generation Presets
Available presets include:
- mockingbird-2.0: High accuracy and factuality for enterprise summaries.
- vectara-summary-ext-24-05-med-omni: GPT-4o-based for advanced summarization and conversational responses.
Example 1: Financial summary with Mockingbird 2.0
1
Generate a tailored financial summary using Mockingbird 2.0 with precise metadata filtering.
generation_preset_name="mockingbird-2.0"
- High accuracy, up-to-date factuality, and balanced tone in enterprise summaries.
corpora=[{"corpus_key": "finance_docs"}]
- Limits search to your company’s trusted finance corpus.
metadata_filter="doc_region = 'EU' AND doc_quarter = 'Q1-2024' AND doc_industry = 'banking'"
- Ensures only documents relevant to European banks in Q1 2024 are used—critical for regulatory and reporting accuracy.
max_used_search_results=10
- Maximizes context coverage but stays concise for summarization.
Example 2: Support chat with GPT-4o
1
Deliver concise support responses with the GPT-4o-based
vectara-summary-ext-24-05-med-omni
preset. This preset leverages a
GPT-4o-based model, ideal for responsive, conversational support and broad
knowledge coverage.
generation_preset_name="vectara-summary-ext-24-05-med-omni"
- Advanced summarization, up-to-date knowledge, and conversational tone.
corpora=[{"corpus_key": "support_kb"}]
- Focuses the model only on vetted support knowledge base articles.
metadata_filter="doc_platform = 'mobile' AND doc_issue_type = 'auth_failure'"
- Surfaces only mobile-specific authentication problems—preventing noise from irrelevant issues.
max_response_characters=500
- Enforces concise, agent-ready responses (ideal for chatbots or customer-facing UIs).
Error Handling
- 400 Bad Request: Invalid parameters (
max_tokens
< 0).- Resolution: Validate all parameters against their constraints.
- 403 Forbidden: Insufficient permissions.
- Resolution: Use a Query or Index API Key with appropriate access.
- 408 Request Timeout: Request exceeds timeout limit.
- Resolution: Increase
request_timeout
or optimize the query.
- Resolution: Increase
- Use
generation_preset_name
inclient.corpora.query
orclient.chats.create
to apply presets likemockingbird-2.0
orvectara-summary-ext-24-05-med-omni
. - Customize with
model_parameters
to override preset settings (e.g.,temperature
,max_tokens
). - Explore the Vectara Prompt Engine for prompt template details.
- Contact
feedback@vectara.com
or use the Vectara Console to create new presets.