Skip to main content

Query API Definition

The Query API lets you perform a query while defining its parameters that specify the query text, pagination details, metadata filters, and other search settings that enable application builders to tailor their queries to specific use cases.

After you index data into one or more corpora, you can run queries and display the results. This page provides a detailed reference for how to run queries and also describes some of Vectara's capabilities in metadata filtering, reranking, Retrieval Augmented Generation (RAG), and hybrid search.


Check out our interactive API Playground that lets you experiment with this REST endpoint to send queries.

Query Request Body and Response

The Query request body specifies different parameters that ask questions about the data within corpora. The Query request requires the following parameters:

  • query - Contains your question and number of results to return.
  • corpusKey - Specifies which corpora to run the query

The query response message encapsulates a single query result. It is a subdocument provided at indexing time. The text is the subdocument text, the score indicates how well the text answers the query (higher scores are better).

The metadata list holds any subdocument-level metadata that was stored with the item at indexing time. The corpus_key indicates which corpus the result came from: recall that a single query can execute against multiple corpora.

Finally, the document_index points at a specific document within the enclosing response set's document array. This is useful for retrieving the document id and document-level metadata.

Query Definition

A single query consists of a query, which is specified in plain text. For example, "Where can I buy the latest iPhone?". Optionally, the query context provides additional information that the system may use to refine the results. For example, "The Apple store near my house is closed due to Covid."

The start field controls the starting position within the list of results, while num_results dictates how many results are returned. Thus, setting start=5 and num_results=20 would return twenty results beginning at position five. These fields are mainly used to provide pagination.

Context Configuration

The contextConfig object lets you specify how each document part appears in the summary of a query. This controls the amount of surrounding context that is included with each matching document part, also known as a snippet. Configuring this addition context affects the results quality for summarization by enhancing relevance and reducing ambiguity. Use characters_before and characters_after to specify the number of characters to include before and after the matching document part. This is useful when you want to provide a fixed-length context around the matching text.

These character properties are mutually exclusive with sentences_before and sentences_after which specify the number of sentences to include before and after the matching document part. This is useful when you want to provide context based on complete sentences rather than a fixed number of characters.

Use start_tag and end_tag to wrap the matching document part. These tags serve as delimiters to indicate where the snippet begins and ends within the surrounding context. For example, you can use <b> as the startTag and </b> as the endTag to wrap the snippet with bold tags. Experiment and iterate to find the optimal context configuration for your specific use case.

Corpus Key Definition

The corpusKey specifies the ID of the corpus being searched. While it's most often the case that a query is run against a single corpus, it's sometimes useful to run against several in parallel.

The metadata_filter allows specifying a predicate expression that restricts the search to a part of the corpus. The filter is written in a simplified SQL dialect and can reference metadata that was marked as filterable during corpus creation.


See the Filter Expressions Overview for a description of their syntax, and Corpus Administration to learn how referenceable metadata is specified during corpus creation.

By default, Vectara only uses its neural/semantic retrieval model, and does not attempt to use keyword matching. To enable hybrid search with a mix of both keyword and neural results, edit the lambda value.

If the corpus specifies custom dimensions (Scale only), weights can be assigned to each dimension as well.

Finally, it's possible to override the semantic interpretation of the query string. Usually, the default settings for the corpus are sufficient. In more advanced scenarios, it's desirable to force it to be treated as a query, or, more rarely, as a response.

Reranking Configuration

The rerankingConfig object enables reranking of results, to further increase relevance in certain scenarios. Scale users can modify the rerankerId of this object. When using mmrConfig, specify a diversityBias value between 0.0 and 1.0. For details about our English cross-attentional (Scale only) and Maximal Marginal Relevance (MMR) rerankers, see Reranking.

Query Summarization Request - Retrieval Augmented Generation

To use Retrieval Augmented Generation (RAG), which Vectara also refers to as "Grounded Generation" -- our groundbreaking way of producing generative summaries on top of your own data -- you can submit a SummarizationRequest alongside your query. This produces a summary that attempts to answer the end-user's question, citing the results as references. For more information, read about Retrieval Augmented Generation.

The summary object enables you to tailor the results of the query summarization. Growth users can specify the maxSummarizedResults and responseLang.

Factual Consistency Score

The Factual Consistency Score, based on a more advanced version of Hughes Hallucination Evaluation Model (HHEM), enables you to evaluate the likelihood of an AI-generated summary being factually correct based on search results. This calibrated score can range from 0.0 to 1.0. A higher scores indicates a greater probability of being factually accurate, while a lower score indicates a greater probability of hallucinations.

In your summarization request, set the factual_consistency_score field to true. The Factual Consistency Score returns a calibrated value in the factual_consistency field of the summary message. The score field contains the value between 0.0 and 1.0.

For example, a score of 0.95 suggests a 95% likelihood that the summary is free of hallucinations and would align with the original content. A lower score of 0.40 indicates a 40% chance which would be probably much less factually accurate. We suggest starting with a setting of 0.5 as an initial guideline for cutoffs between good and bad.

Citation Format in Summary

When generating a summary, Vectara enables Scale users to format the style of citationParams object with one of the following formats:

  • NUMERIC (default) - Citations appear as numbers [1], [2], [N], and so on.
  • NONE - No citations appear in the summary.
  • HTML - Citations appears as a URL: <a href="https://my.doc/foo">[N]</a>
  • MARKDOWN - Citations appears in Markdown: [N](https://my.doc/foo)

If set to HTML or MARKDOWN, you must customize the citation using both of the urlPattern and textPattern fields to enable dynamic citation generation. Both of these parameters can access all part and document level metadata fields.

For example, the urlPattern field can specify {} and {} metadata as https://mypdf.doc/foo/{}#page={}. The textPattern field specifies the document and part metadata name in curly braces. For example, use {doc.title} and the final result appears as Title.

To use citations, you must specify one of the following summarizers in summarizerPromptName:

  • vectara-summary-ext-24-05-sml - (gpt-3.5-turbo)
  • vectara-summary-ext-24-05-med - (gpt-4.0)
  • vectara-summary-ext-24-05-large - (gpt-4.0-turbo)

For more information, see the documentation about selecting summarizers.

Default Citation Behavior

  • If textPattern is not specified, it defaults to the numerical position of the result ([1], [2], [N].).
  • The urlPattern does not have a default, so this field must be explicitly defined.

Citation Example

In this example, you want Vectara to say as seen in [Document-Title] with a link to the specific page:

"citationParams": {
"style": "MARKDOWN",
"urlPattern": "{}#page={}",
"textPattern": "as seen in {doc.title}"

The response will look something like this:

In the Metropolitan Transportation Authority (MTA) rules, it is prohibited to 
destroy, mark, soil, paint, draw, inscribe, or place graffiti on any facility
or conveyance of the authority [as seen in Rules of Conduct and

Advanced Summarization Customization Options

Scale users have access to more powerful summarization capabilities, which present a powerful toolkit for tailoring summarizations to specific application and user needs.

The summarizerPromptName allows you to specify one of our available summarizers. Use promptText to override the default prompt text with a custom prompt. Your use case might require a chatbot to be more human like, so you decide to create a custom response format that behaves more playfully in a conversation or summary.

The debug option lets you view detailed logs to help in troubleshooting and optimization. The responseChars lets you control the length of the summary, but note that it is not a hard limit like with the maxTokens parameter. The modelParams object provides even more fine-grained controls for the summarizer model:

  • maxToken specifies a hard limit on the number of characters in a response. This value supercedes the responseChars parameter in the summary object.
  • temperature indicates whether you want the summarization to not be creative at all 0.0, or for the summarization to take more creative liberties as you approach the maximium value of 1.0.
  • frequencyPenalty provides even more granular control to help ensure that the summarization decreases the likelihood of repeating words. The values range from 0.0 to 1.0
  • presencePenalty provides more control over whether you want the summary to include new topics. The values also range from 0.0 to 1.0.

By leveraging these advanced capabilities, application builders can fine-tune the behavior and output style of the summarizer to align with your unique application requirements.

Chat Conversation Located within the Summary

If you enabled chat on the corpus, the summary object contains a conversation from Vectara Chat which includes a conversationId. You enable Vectara Chat by setting the store value to true.

The Vectara Chat APIs have more details about conversations.

REST Example

Query API Endpoint Address

Vectara exposes a REST endpoint at the following URL to search content from a corpus:

The API Playground shows the full Query REST definition.

gRPC Example

You can find the full Query gRPC definition at serving.proto.

Query Service and Request

The definition shows details about the query service. The system accepts a query and returns a response, which contains a list of results. For efficiency, one or more queries can be batched into a single request. query contains the search terms that the system needs to match against the data. Then ContextConfig specifies the amount of text or number of sentences before and after the result snippet.

Corpus Key

The corpus_key allows the query to be executed across multiple corpora. The CorpusKey identifies a specific corpus or corpora to include in the query. Specifying the customer_id is optional, since it defaults to the customer attached to the gRPC request.

Summarization Request Example

The full Query definition provides the detailed summary request. When Vectara responds with the list of results that most semantically answer the user, it will also then produce a summary of the results with its sources cited. For more details on Retrieval Augmented Generation, have a look at the chatbots and grounded generation overview.

The summary comes back in a format where the text contains a summary of the relevant results to the given search with those relevant results included as cited sources. Vectara cites these by [number] format. For example, if the 1st result is in the summary, it is cited as [1].


The response set groups a list of responses, sorted in order of score, together with a list of statuses and enclosing documents. Since it's possible for several results to come from the same document, the length of the document list may be less than the length of the response list.


Attribute represents a named piece of metadata. Both the name and its value are string typed.

message Attribute {
string name = 5;
string value = 10;

Batch Query and Response

The batch query request and response messages simply aggregate several individual queries and response sets, respectively. The response sets will match the queries in both number and order. For example, the third response set in the batch response will correspond with the third query in the batch request.

message BatchQueryRequest {
repeated QueryRequest query = 5;

message BatchQueryResponse {
repeated ResponseSet response_set = 5;

repeated Status status = 1000;

Advanced Scenarios

Search Multiple Corpora

There are situations where searching multiple corpora simultaneously can be beneficial. To do this effectively, you need two things:

  1. Proper Permissions: Setting up an API Key that grants access to all corpora that you intend to search.
  2. Query Body Adjustment: Specific modifications to the query body as outlined below.

The query body modification that's necessary is that corpusKey can take an array of objects.

Search a Single Corpus Example

So if you're currently searching 1 corpus as follows:

"corpusKey": [
"customerId": 1234,
"corpusId": 5678,
"semantics": 0,
"metadataFilter": "",
"dim": []

Search Multiple Corpora Example

As long as your API key has permissions to each of these corpora, you can search multiple corpora at once as follows:

"corpusKey": [
"customerId": 1234,
"corpusId": 5678,
"semantics": 0,
"metadataFilter": "",
"dim": []
"customerId": 1234,
"corpusId": 9876,
"semantics": 0,
"metadataFilter": "",
"dim": []

In this example, the query returns results across the queried corpora. The corpusKey is returned in the response for each document if you need to use it in your application.