Query API Definition
Organizations often struggle to extract relevant information from large datasets and get meaningful answers to complex queries. The Vectara Query API offers a powerful and flexible solution for performing sophisticated searches across one or more corpora and significantly enhance the accuracy and relevance of search results.
The Query API lets you perform a query while defining its parameters that specify the query text, pagination details, metadata filters, and other search settings that enable application builders to tailor their queries to specific use cases.
After you index data into one or more corpora, you can run queries and display the results. This page provides a detailed reference for how to run queries and also describes some of Vectara's capabilities in metadata filtering, reranking, Retrieval Augmented Generation (RAG), and hybrid search.
Query Types
The Vectara REST API 2.0 has different types of queries for you use depending on your search needs. Depending on the type, queries enable you to define parameters that control the behavior of the query and summarization:
- Search Parameters: Filter data by metadata, apply lexical weighting, add additional context about the data, and rerank the results
- Summarization Parameters: Choose model and prompt, and response settings like length and factual scoring, and even more nuanced model parameters
- Stream Response: Optionally have the summarized response stream in real time.
The exact request format depends on the specific query type that you want to use.
Check out our interactive API Reference that you experiment with these query types. You can also perform a simple single corpus query, advanced single corpus query, and a multiple corpora query.
Multiple Corpora Query
The /v2/query
endpoint allows you to perform Retrieval Augmented Generation
(RAG) across one or more corpora in your account. You send a POST request in
the body that specifies the following:
query
- Contains your query textstream_response
- Indicates whether to stream the response in real-time (true
) or to send a complete summary at the end of processing the request (false
)search
- Specifies the search parametersgeneration
- Specifies the summarization parameters andgeneration_preset_name
. Table
summarization has a specific preset:vectara-summary-table-query-ext-dec-2024-gpt-3-5
orvectara-summary-table-query-ext-dec-2024-gpt-4o
.
Excluding this generation field disables summarization. The generation preset
contains the name
, description
, llm_name
, prompt_template
, and other
fields make up the preset.
This query type is useful when you want to query all your data sources at once.
Simple Single Corpus Query
Send a simplified GET request to /v2/corpora/:corpus_key/query
for querying
a single corpus that specifies the following:
q
: Contains the query stringlimit
: Specifies the maximum number of resultsoffset
: Specifies the starting position in results
This query types provides a lightweight way to search a single corpus.
Advanced Single Corpus Query
Send a POST request to /v2/corpora/:corpus_key/query
to query a specific
corpus with more advanced capabilities. This
query type allows for detailed customization, including the ability to
summarize table data when querying tables.
The request body is similar to the Query Corpora type and specifies the same parameters:
query
- Contains your query textstream_response
- Indicates whether to stream the response in real-time or to send a complete summary at the end of processing the requestsearch
- Specifies the search parametersgeneration
- Specifies the summarization parameters andgeneration_preset_name
.
Excluding this generation field disables summarization. The generation preset
contains the name
, description
, llm_name
, prompt_template
, and other
fields make up the preset.
This advanced type provides additional search filtering and customization options compared to the simple GET method.
The query response message encapsulates a single query result. It is a subdocument
provided at indexing time. The text
is the subdocument text, the score
indicates how well the text answers the query (higher scores are better).
The metadata
list holds any subdocument-level metadata that was stored with
the item at indexing time. The corpus_key
indicates which corpus the result
came from: recall that a single query can execute against multiple corpora.
While it's most often the case that a query is run against a single
corpus, it's sometimes useful to run against several in parallel.
Finally, the document_index
points at a specific document within the
enclosing response set's document
array. This is useful for retrieving the
document id and document-level metadata.
Corpus Key Definition
If you want to query a specific corpus or corpora, include the unique
corpus_key
in the path of the request such as v2/corpora/:corpus_key
.
When creating a new corpus, you have the flexibility to specify a custom
corpus_key
that follows a naming convention of your choice. This allows you to
assign easily identifiable keys to your corpora, making it easier to manage and
reference them in your application.
As part of the migration from API 1.0 to 2.0, all existing corpora have been
assigned a new corpus_key
based on their original name and corpus_id
. The
corpus_key
is created by combining the name of the corpus (with underscores
replacing spaces) and the original numeric ID.
Query Definition
A single query consists of a query, which is specified in plain text. For example, "Where can I buy the latest iPhone?". Optionally, the query context provides additional information that the system may use to refine the results. For example, "The Apple store near my house is closed due to Covid."
Within the search
object, add custom_dimensions
weights (Pro or Enterprise),
metadata_filter
and set the lexical_interpolation
(formerly lambda
in
the REST API v1.0). Setting to 0
disables exact and Boolean text matching,
while a value of 1
disables neural retrieval. Users often see best results by
setting this value somewhere between 0.01 and 0.1, and we typically
recommend users start experimentation with a 0.025
.
The semantics
parameter indicates whether to consider a query against this
corpus as a query or a response. The offset
field controls the starting
position within the list of results, while limit
dictates how many results
are returned. Thus, setting offset=5
and limit=20
would return twenty
results beginning at position five. These fields are mainly used to provide
pagination.
The context_configuration
object lets you specify whether you want a specific
number of characters or sentences before or after the matching document part.
Finally, the reranking configuration enables reranking of results, to further increase relevance in certain scenarios. For details about our Multilingual, Maximal Marginal Relevance (MMR), and User Defined Function rerankers, see Rerank Search Results.
Query Request and Response
The corpus_key
specifies the ID of the corpus being searched. The
metadata_filter
allows specifying a predicate expression that restricts
the search to a part of the corpus. The filter is written in a simplified SQL
dialect and can reference metadata that was marked as filterable during corpus
creation.
See the Filter Expressions Overview for a description of their syntax, and Corpus Administration to learn how referenceable metadata is specified during corpus creation.
By default, Vectara only uses its neural/semantic retrieval model,
and does not attempt to use keyword matching. To enable hybrid search with a
mix of both keyword and neural results, edit the lexical_interpolation
value.
If the corpus specifies custom dimensions (Pro or Enterprise), weights can be assigned to each dimension as well.
Finally, it's possible to override the semantic interpretation of the query string. Usually, the default settings for the corpus are sufficient. In more advanced scenarios, it's desirable to force it to be treated as a query, or, more rarely, as a response.
Reranking Configuration
The reranker
object enables the reranking of query results, to further
increase relevance in certain scenarios. The cutoff
property lets you
specify a minimum score threshold for search results to be included after
reranking. The limit
property of this reranker
object allows you to have
more granular control over the number of results returned after reranking. For
more details about these properties, see Rerank Search Results:
- Specify the
type
ascustomer_reranker
andreranker_name
asRerank_Multilingual_v1
to use the Multilingual Reranker v1, also known as Slingshot. - Specify the
type
asmmr
to use the Maximal Marginal Relevance (MMR) Reranker. This reranker lets you specify adiversity_bias
value between0.0
and1.0
. - Specify the
type
asuserfn
to use the User Defined Function Reranker. - Specify the
type
aschain
to use the Chain Reranker. - To use Knee Reranking, configure the chain reranker to first
use the Vectara Multilingual Reranker (Slingshot). Then specify the
type
asuserfn
anduser_function
asknee()
to enable Knee Reranking in the chain. - If you do not want to use a reranker, set the type to
none
.
Query Summarization Request - Retrieval Augmented Generation
To use Retrieval Augmented Generation (RAG), which Vectara also refers to as
"Grounded Generation" -- our groundbreaking way of producing generative
summaries on top of your own data -- you can submit a generation
that attempts to answer the
end-user's question, citing the results as references. For more information,
read about Retrieval Augmented Generation.
The generation
object enables you to tailor the results of the query
summarization. Growth users can specify the max_summarized_results
,
response_language
, and enable_factual_consistency_score
.
"generation": {
"generation_preset_name": "vectara-summary-ext-v1.2.0",
"max_used_search_results": 5,
"prompt_template": "[\n {\"role\": \"system\", \"content\": \"You are a helpful search assistant.\"},\n #foreach ($qResult in $vectaraQueryResults)\n {\"role\": \"user\", \"content\": \"Given the $vectaraIdxWord[$foreach.index] search result.\"},\n {\"role\": \"assistant\", \"content\": \"${qResult.getText()}\" },\n #end\n {\"role\": \"user\", \"content\": \"Generate a summary for the query '\''${vectaraQuery}'\'' based on the above results.\"}\n]\n",
"max_response_characters": 300,
"response_language": "eng",
"model_parameters": {
"max_tokens": 0,
"temperature": 0,
"frequency_penalty": 0,
"presence_penalty": 0
},
Users also have access to advanced summarization customization options.
Generation Presets
The generation-preset-name
field in generation
object specifies the prompt
template to use. Generation presets bundle several properties that configure
generation for the request, providing more flexibility in how parameters are
set. The preset includes the prompt_template
, the LLM, and other settings
like max_tokens
and temperature
.
"generation-preset-name": "vectara-summary-ext-v1.3.0"
To view available generation presets, use the List Generation Presets API.
The generation-preset-name
field replaces the prompt_name
field that was
previously in the generation
object. The prompt_name
field is now deprecated
but still supported for backward compatibility.
Mockingbird: Enhanced RAG Performance
For users seeking superior RAG performance, Vectara offers Mockingbird, our advanced LLM specifically designed for RAG tasks.
To use Mockingbird for your RAG tasks, specify mockingbird-1.0-2024-07-16
in
the generation-preset-name
field the generation
object, like in this example:
{
"generation": {
"generation_preset_name": "mockingbird-1.0-2024-07-16",
"max_used_search_results": 5,
"response_language": "eng",
"enable_factual_consistency_score": true
}
}
Mockingbird is particularly beneficial for enterprise applications requiring high-quality summaries and structured outputs. For more details on Mockingbird's capabilities and performance, see the Mockingbird LLM section.
Factual Consistency Score
The Factual Consistency Score, based on a more advanced version of
Hughes Hallucination Evaluation Model (HHEM),
enables you to evaluate the likelihood of an AI-generated summary being
factually correct based on search results. This calibrated score can
range from 0.0
to 1.0
. A higher scores indicates a greater probability of
being factually accurate, while a lower score indicates a greater probability
of hallucinations. It also supports English, German, and French (eng
, deu
, fra
)
as the response_language
.
In your summarization request, set the enable_factual_consistency_score
field to true
.
The Factual Consistency Score returns a calibrated value in the
factual_consistency_score
field of the summary message. The score field
contains the value between 0.0
and 1.0
.
For example, a score of 0.95
suggests a 95% likelihood that the summary is
free of hallucinations and would align with the original content. A lower
score of 0.40
indicates a 40% chance which would be probably much less
factually accurate. We suggest starting with a setting of 0.5
as an initial
guideline for cutoffs between good and bad.
Citation Format in Summary
When generating a summary, Vectara enables users to format the style
of
citations
object with one of the following formats:
numeric
(default) - Citations appear as numbers[1]
,[2]
,[N]
, and so on.none
- No citations appear in the summary.html
- Citations appears as a URL:<a href="https://my.doc/foo">[N]</a>
MARKDOWN
- Citations appears in Markdown:[N](https://my.doc/foo)
If set to html
or markdown
, you must customize the citation using
both of the url_pattern
and text_pattern
fields to enable dynamic citation
generation. Both of these parameters can access all part and document level
metadata fields.
For example, the url_pattern
field can specify {doc.id}
and {part.page}
metadata as https://mypdf.doc/foo/{doc.id}#page={part.page}
.
The text_pattern
field specifies the document and part metadata name in curly
braces. For example, use {doc.title}
and the final result appears as
Title.
To use citations, you must specify one of the following summarizers
in generation_preset
:
mockingbird-1.0-2024-07-16
- (Vectara's Mockingbird LLM)vectara-summary-ext-24-05-sml
- (gpt-3.5-turbo)vectara-summary-ext-24-05-med-omni
- (gpt-4o)vectara-summary-ext-24-05-med
- (gpt-4.0)vectara-summary-ext-24-05-large
- (gpt-4.0-turbo)
For more information, see the documentation about selecting summarizers.
Default Citation Behavior
- If
text_pattern
is not specified, it defaults to the numerical position of the result ([1], [2], [N].). - The
url_pattern
does not have a default, so this field must be explicitly defined.
Citation Example
In this example, you want Vectara to say as seen in [Document-Title]
with a
link to the specific page:
{
"citations": {
"style": "MARKDOWN",
"url_pattern": "{doc.id}#page={section.page}",
"text_pattern": "as seen in {doc.title}"
}
}
The response will look something like this:
In the Metropolitan Transportation Authority (MTA) rules, it is prohibited to
destroy, mark, soil, paint, draw, inscribe, or place graffiti on any facility
or conveyance of the authority [as seen in Rules of Conduct and
Fines](https://new.mta.info/document/36821#page=3).
Disable query summarization
To disable summarization, exclude the generation
object from a query.
Advanced Summarization Customization Options
Vectara also provides more powerful summarization capabilities for tailoring summarizations to specific application and user needs. If they need to change generation beyond what the preset specifies, you can override most parameters in a query. For example:
{
"generation": {
"generation-preset-name": "vectara-summary-ext-v1.3.0",
"max_tokens": 300,
"temperature": 0.7
}
}
To provide even more customization beyond, you can override certain parameters in your query. This enables you to use a preset as a starting point while you tailor specific aspects of the generation.
The generation_preset_name
allows you to specify one of our available summarizers.
Use generation_preset_name
and prompt_template
to override the default prompt with a
custom prompt. Your use case might
require a chatbot to be more human like, so you decide to create a custom
response format that behaves more playfully in a conversation or summary.
The max_response_characters
lets you control the length of the summary, but
note that it is not a hard limit like with the max_tokens
parameter. The
model_parameters
object provides even more fine-grained controls for the summarizer
model:
max_tokens
specifies a hard limit on the number of characters in a response. This value supercedes theresponseChars
parameter in thesummary
object.temperature
indicates whether you want the summarization to not be creative at all0.0
, or for the summarization to take more creative liberties as you approach the maximum value of1.0
.frequency_penalty
provides even more granular control to help ensure that the summarization decreases the likelihood of repeating words. The values range from0.0
to1.0
presence_penalty
provides more control over whether you want the summary to include new topics. The values also range from0.0
to1.0
.
By leveraging these advanced capabilities, application builders can fine-tune the behavior and output style of the summarizer to align with your unique application requirements.
REST 2.0 URL
Query API Endpoint Address
Vectara exposes a REST endpoint at the following URL to search content from a corpus:https://api.vectara.io/v2/query
The API Reference shows the full Query REST definition.