Configuring query parameters
Configuring your query parameters enables you to get the most relevant and accurate results. This section covers the key configuration parameters that control search behavior, result retrieval, reranking, context handling, and AI-generated responses.
Corpora search configuration
The search object controls which corpora to search and how to filter and
retrieve results:
- corpus_key (required): Unique identifier for the corpus to search.
- metadata_filter: SQL-like filter to narrow results (
doc.year = '2024'). - lexical_interpolation: Balance between semantic (
0.0) and keyword (1.0) search. Default:0.025. - limit: Maximum results to retrieve before reranking. Default:
10. - offset: Number of results to skip for pagination.
- semantics: Query interpretation mode ("
query", "response", or "default").
SEARCH CONFIGURATION EXAMPLE
Code example with json syntax.1
Context configuration
The context_configuration object controls how much surrounding text is
included with each search result:
- sentences_before/sentences_after: Number of sentences to include before/after matching text.
- characters_before/characters_after: Alternative character-based boundaries for precise control.
- start_tag/end_tag: HTML tags for highlighting matching text in results.
You can only use sentences before/after or characters before/after, but not both.
CONTEXT CONFIGURATION EXAMPLE
Code example with json syntax.1
Reranker configuration
Rerankers improve result quality by reordering search results to place the most relevant content first:
- type: Reranker type
customer_reranker: Default multilingual reranker (recommended).mmr: Maximal Marginal Relevance to reduce redundancy.none: Disables reranking (not recommended).
- reranker_name: Specific reranker model (
Rerank_Multilingual_v1). - limit: Maximum results after reranking.
- cutoff: Minimum relevance score (
0.0-1.0) for result inclusion. Typically0.3-0.7. - include_context: Use surrounding context text for more accurate scoring.
RERANKER CONFIGURATION EXAMPLE
Code example with json syntax.1
Generation configuration
The generation object controls how the agent creates natural language
responses. Excluding this generation field disables summarization.
- enabled: Enable or disable generative summarization.
- generation_preset_name: Pre-configured prompt and model bundle (
mockingbird-2.0). - max_used_search_results: Number of top results to send to the LLM..
Default:
5 - max_response_characters: Soft limit for response length.
- response_language: Response language code (
auto,eng,spa, etc.). - citations: Citation formatting.
- style: Citation format (
numeric,html,markdown, ornone). - url_pattern: URL template using metadata variables
(
https://docs.example.com/{doc.id}). - text_pattern: Display text template (
[{doc.title}]).
- style: Citation format (
- prompt_template: Override default prompt using Apache Velocity syntax.
- model_parameters: LLM settings (temperature, max_tokens, etc.).
- enable_factual_consistency_score: Validate factual consistency of responses.
GENERATION CONFIGURATION EXAMPLE
Code example with json syntax.1
The generation_preset_name is specified in the generation object of a query.
Excluding this generation field disables summarization.
Currently available prompts
Vectara provides several official prompts (generation presets) to our users
that you specify in the generation_preset_name within the generation object.
We recommend the following generation presets:
mockingbird-2.0(Vectara's cutting-edge LLM for Retrieval Augmented Generation. See Mockingbird LLM for more details.)vectara-summary-ext-24-05-med-omni(gpt-4o, for citations)vectara-summary-ext-24-05-large(gpt-4.0-turbo, for citations)vectara-summary-ext-24-05-med(gpt-4.0, for citations)vectara-summary-ext-24-05-sml(gpt-3.5-turbo, for citations)
Use the following prompt if you have tables:
vectara-summary-table-query-ext-dec-2024-gpt-4o
Legacy prompts
These prompts will soon be deprecated:
vectara-summary-ext-v1.2.0vectara-summary-ext-v1.3.0
Default max_used_search_results limit
The default limit of max_used_search_results is 25 search results. Setting
the values closer to the limit generates a more comprehensive summary, but
using a lower value can balance the results with quality and response time.
Too many results will prevent the prompt from generating a response. If that happens, try reducing this number.
max_used_search_results example
This generation preset example attempts to balance creating a good quality
summary with a reasonably fast response by setting max_used_search_results to
50.
Advanced Summarization Customization Options
Our users also have access to more powerful summarization capabilities, which present a powerful toolkit for tailoring summarizations to specific application and user needs.
Use generation_preset_name and prompt_template to override the default prompt with a
custom prompt. Your use case might
require a chatbot to be more human like, so you decide to create a custom
response format that behaves more playfully in a conversation or summary.
In generation, max_response_characters lets you control the length of the summary, but
note that it is not a hard limit like with the max_tokens parameter. The
model_parameters object provides even more fine-grained controls for the summarizer
model:
llm_namespecifies the name of the LLM model to use for summarization, such asgpt-4. If specified, it overrides the model behindgeneration_preset_name.max_tokensspecifies a hard limit on the number of characters in a response. This value supercedes themax_response_charactersparameter in thesummaryobject.temperatureindicates whether you want the summarization to not be creative at all0.0, or for the summarization to take more creative liberties as you approach the maximium value of1.0.frequency_penaltyprovides even more granular control to help ensure that the summarization decreases the likelihood of repeating words. The values range from0.0to1.0presence_penaltyprovides more control over whether you want the summary to include new topics. The values also range from0.0to1.0.
By leveraging these advanced capabilities, application builders can fine-tune the behavior and output style of the generation preset to align with your unique application requirements.
Response languages
The response_language field in allows control of the language
for sumarization requests. You can ask to attempt
to guess the language of the query and respond in that guessed language by
setting response_language to auto. However, this guessing is not perfect:
many languages have many borrowed words and phrases which makes
guessing the language difficult to impossible at times. For that reason, it's
recommended that you send the user's preferred language when you know it.
One possible way to do this is just to ask the user to configure their preferred language or to use the localization of your application to determine the best language to send to . Alternatively, if your application is a web-based application, you can consider using the Navigator.language and Navigator.languages API.
For the most up-to-date list of languages supported by 's models, see https://github.com/vectara/protos/blob/main/common.proto#L10.
Both ISO 639-1 and ISO 639-3 language codes are supported in this API.
Some features, such as factual_consistency_score, may not work on all
languages. The Factual Consistency Score is supported in English, German, and French (eng,deu,fra).
Citations
Citations provide important source attribution in query results, enabling users to verify information and trace content back to its original sources. This transparency is essential for building trust in AI-generated content and supporting fact-checking workflows.
When Vectara generates summaries or retrieval results, it automatically includes citations that reference the specific sources used. These citations create a direct link between the generated content and the underlying documents, ensuring traceability and accountability.
Citations appear in the format [number] within summary text, where:
- Numbers start from 1.
- Each number corresponds to a result in the
search_resultsarray. - Numbers increment sequentially for each unique source referenced.
Citation formatting options
Vectara supports multiple citation formats to suit different application
needs. You can control how citations appear in summaries using the
citations_options parameter in your query. We support the following formats:
Numeric (default)
NUMERIC CITATION FORMAT
Code example with json syntax.1
Citations appear as numbers in square brackets: [1], [2], [3]
This is the default citation format. If you don't specify citations_options, Vectara will use numeric citations automatically.
None
Disables citations entirely, producing clean text without source references
Use this option when you want clean summary text without any source attribution. This is useful for conversational interfaces where citations might disrupt the flow.
HTML
Formats citations as HTML links for web applications
Perfect for web applications where you want clickable citations. You can style these links with CSS and add JavaScript handlers to show source details on click.
Markdown
Formats citations as Markdown links, useful for Markdown-based applications
Ideal for documentation systems, chat applications, or any interface that renders Markdown. The links can be processed by your Markdown renderer.
Code Output Support
Markdown citation style enables formatting of code blocks and technical content in responses, making it easier to display code snippets, YAML configurations, or structured technical output.
Advanced citation options
ADVANCED CITATION CUSTOMIZATION
Code example with json syntax.1
You can further customize citation behavior with additional parameters:
These advanced options allow you to customize how citations are formatted
beyond the standard styles. Use url_pattern and text_pattern to create
custom citation formats that match your application's needs.
Enable citations in queries
Query with custom citation format
QUERY WITH CUSTOM CITATION FORMAT
Code example with json syntax.1
This example explicitly sets the citation style to Markdown.