Queries
The Query APIs are the primary interface for searching your data and generating AI-powered summaries. After indexing documents into corpora, you use these endpoints to retrieve relevant information and optionally generate grounded summaries using Retrieval Augmented Generation (RAG).
Vectara provides three query types to match your search needs:
Single Corpus Query
Quick search with minimal parameters:
- Simple search-only use cases
- Minimal configuration needed
- Performance-critical lightweight queries
Parameters: q (query), limit, offset
Advanced Single Corpus Query
Full-featured search and RAG for single corpus:
- Need RAG for single corpus
- Table summarization required
- Advanced search customization
- Metadata filtering and reranking
Multiple Corpora Query
Search across one or more corpora with full RAG capabilities:
- Query multiple data sources simultaneously
- Need AI-generated summaries with citations
- Require advanced customization (streaming, custom prompts, factual scoring)
- Building conversational AI applications
Configuring your query parameters enables you to get the most relevant and accurate results. This section covers the key configuration parameters that control search behavior, result retrieval, reranking, context handling, and AI-generated responses.
Corpora search configuration
The search object controls which corpora to search and how to filter and
retrieve results:
- corpus_key (required): Unique identifier for the corpus to search.
- metadata_filter: SQL-like filter to narrow results (
doc.year = '2024'). - lexical_interpolation: Balance between semantic (
0.0) and keyword (1.0) search. Default:0.025. - limit: Maximum results to retrieve before reranking. Default:
10. - offset: Number of results to skip for pagination.
- semantics: Query interpretation mode ("
query", "response", or "default").
{
"search": {
"corpora": [{
"corpus_key": "my-corpus",
"metadata_filter": "doc.year = '2025' AND doc.category = 'technology'",
"lexical_interpolation": 0.005
}],
"limit": 10,
"offset": 0,
"semantics": "default"
}
}
Context configuration
The context_configuration object controls how much surrounding text is
included with each search result:
- sentences_before/sentences_after: Number of sentences to include before/after matching text.
- characters_before/characters_after: Alternative character-based boundaries for precise control.
- start_tag/end_tag: HTML tags for highlighting matching text in results.
You can only use sentences before/after or characters before/after, but not both.
{
"search": {
"context_configuration": {
"sentences_before": 2,
"sentences_after": 2,
"start_tag": "<mark>",
"end_tag": "</mark>"
}
}
}
Reranker configuration
Rerankers improve result quality by reordering search results to place the most relevant content first:
- type: Reranker type
customer_reranker: Default multilingual reranker (recommended).mmr: Maximal Marginal Relevance to reduce redundancy.none: Disables reranking (not recommended).
- reranker_name: Specific reranker model (
Rerank_Multilingual_v1). - limit: Maximum results after reranking.
- cutoff: Minimum relevance score (
0.0-1.0) for result inclusion. Typically0.3-0.7. - include_context: Use surrounding context text for more accurate scoring.
{
"search": {
"reranker": {
"type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"limit": 10,
"cutoff": 0.5,
"include_context": true
}
}
}
Generation configuration
The generation object controls how the agent creates natural language
responses:
- enabled: Enable or disable generative summarization.
- generation_preset_name: Pre-configured prompt and model bundle (
mockingbird-2.0). - max_used_search_results: Number of top results to send to the LLM..
Default:
5 - max_response_characters: Soft limit for response length.
- response_language: Response language code (
auto,eng,spa, etc.). - citations: Citation formatting.
- style: Citation format (
numeric,html,markdown, ornone). - url_pattern: URL template using metadata variables
(
https://docs.example.com/{doc.id}). - text_pattern: Display text template (
[{doc.title}]).
- style: Citation format (
- prompt_template: Override default prompt using Apache Velocity syntax.
- model_parameters: LLM settings (temperature, max_tokens, etc.).
- enable_factual_consistency_score: Validate factual consistency of responses.
{
"generation": {
"generation_preset_name": "mockingbird-2.0",
"max_used_search_results": 10,
"max_response_characters": 500,
"response_language": "eng",
"citations": {
"style": "numeric"
},
"enable_factual_consistency_score": true,
"model_parameters": {
"temperature": 0.7,
"max_tokens": 300
}
}
}
📄️ Simple Single Corpus Query
Search a single corpus with a straightforward query request, specifying the corpus key and query parameters.
📄️ Advanced Single Corpus Query
Perform an advanced query on a specific corpus to find relevant results, highlight relevant snippets, and use Retrieval Augmented Generation.
📄️ Multiple Corpora Query
Perform a multipurpose query to retrieve relevant information from one or more corpora and generate a response using Retrieval Augmented Generation (RAG).