Advanced Single Corpus Query
POST/v2/corpora/:corpus_key/query
Perform an advanced query on a specific corpus to find relevant results, highlight relevant snippets, and use Retrieval Augmented Generation:
- Specify the unique
corpus_key
identifying the corpus to query. Thecorpus_key
is created in the Vectara Console UI or the Create Corpus API definition. When creating a new corpus, you have the option to assign a customcorpus_key
following your preferred naming convention. This key serves as a unique identifier for the corpus, allowing it to be referenced in search requests. For more information, see Corpus Key Definition. - Customize your search by specifying the query text (
query
), pagination details (offset
andlimit
), and metadata filters (metadata_filter
) to tailor your search results. Learn more - Leverage advanced search capabilities like reranking (
reranker
) and Retrieval Augmented Generation (RAG) (generation
) for enhanced query performance. Generation is opt in by setting thegeneration
property. By excluding the property or by setting it to null, the response will not include generation. Learn more. - Use hybrid search to achieve optimal results by setting different values for
lexical_interpolation
(e.g.,0.025
). Learn more - Specify Vectara's RAG-focused LLM (Mockingbird) for the
generation_preset_name
. Learn more - Use advanced summarization options that utilize detailed summarization parameters such as
max_response_characters
,temperature
, andfrequency_penalty
for generating precise and relevant summaries. Learn more
For more detailed information, see Query API guide.
Request
Path Parameters
Possible values: <= 50 characters
, Value must match regular expression [a-zA-Z0-9_\=\-]+$
The unique key identifying the corpus to query.
Header Parameters
Possible values: >= 1
The API will make a best effort to complete the request in the specified seconds or time out.
Possible values: >= 1
The API will make a best effort to complete the request in the specified milliseconds or time out.
- application/json
Body
- CustomerSpecificReranker
- UserFunctionReranker
- MMRReranker
- ChainReranker
- NoneReranker
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- The
prompt_template
that is rendered and then sent to the LLM. - The LLM used.
model_parameter
s such as temperature.numeric
- Citations formatted as simple numerals: [1], [2] ...none
- Citations removed from text.html
- Citation formatted as a URL like<a href="url_pattern">text_pattern</a>
.markdown
- Formatted as[text_pattern](url_pattern)
.
The search query string, which is the question the user is asking.
search object
Search parameters to retrieve knowledge for the query.
custom_dimensions object
The custom dimensions as additional weights.
The filter string used to narrow the search based on metadata attributes. The query against this
corpus will be confined to document parts that match the metadata_filter
. Only metadata fields
set as filter_attributes
on the corpus can be filtered. Filter syntax is similar to
a SQL WHERE clause. See metadata filters documentation
for more information.
Possible values: <= 1
How much to weigh lexical scores compared to the embedding score. 0 means lexical search is not used at all, and 1 means only lexical search is used.
Possible values: [default
, query
, response
]
Default value: default
Indicates whether to consider a query against this corpus as a query or a response.
Specifies how many results into the result to skip. This is useful for pagination.
Possible values: >= 1
Default value: 10
The maximum number of results returned.
context_configuration object
Configuration on the presentation of each document part in the result set.
The number of characters that are shown before the matching document part.
This is useful to show the context of the document part in the wider document.
Ignored if sentences_before
is set.
Vectara will capture the full sentence that contains the captured characters,
to not lose the meaning caused by a truncated word or sentence.
The number of characters that are shown after the matching document part.
This is useful to show the context of the document part in the wider document.
Ignored if sentences_after
is set.
Vectara will capture the full sentence that contains the captured characters,
to not lose the meaning caused by a truncated word or sentence.
The number of sentences that are shown before the matching document part. This is useful to show the context of the document part in the wider document.
The number of sentences that are shown after the matching document part. This is useful to show the context of the document part in the wider document.
The tag that wraps the document part at the start. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.
The tag that wraps the document part at the end. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.
reranker object
Rerank results of the search. Rerankers are very powerful tools to improve the order of search results.
By default the search will use the most powerful reranker available to the customer's plan.
To disable reranking, set the reranker type
to "none"
.
Default value: customer_reranker
When the type is customer_reranker
, you can specify the reranker_name
of a reranker. reranker_id
is deprecated.
The retrieval engine will then rerank results using that reranker.
Possible values: Value must match regular expression rnk_(?!272725718)\d+
The ID of the reranker. The multilingual reranker that may be used by Scale customers is rnk_272725719.
Do not specify the MMR reranker ID here, and instead, use the MMR reranker object type.
Deprecated: Use reranker_name
instead.
The name of the reranker. Do not specify the MMR reranker name here. Instead, use the MMR reranker object type.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: userfn
When the type is userfn
, you can define custom reranking functions using document-level metadata,
part-level metadata, or scores generated from the request-level metadata.
The user defined function.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: mmr
When the type is mmr
, you can specify the diversity_bias
, and the
retrieval engine will use the MMR reranker.
The diversity bias. Higher values indicate more diversity.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: chain
When the type is chain
, you can then chain re-rankers together.
Possible values: <= 50
Specify an array of rerankers to apply to search results consecutively.
Default value: none
When the type is none
, no reranking will be done.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
generation object
The parameters to control generation.
Possible values: non-empty
The preset values to use to feed the query results and other context to the model.
A generation_preset
is an object with a bundle of properties that specifies:
All of these properties except the model can be overridden by setting them in this
object. Even when a prompt_template
is set, the generation_preset_name
is used to set
the model used.
If generation_preset_name
is not set, the Vectara platform will use the default model and
prompt.
Possible values: non-empty
Use generation_preset_name
instead of prompt_name
.
Default value: 5
The maximum number of search results to be available to the prompt.
Vectara manages both system and user roles and prompts for the generative
LLM out of the box by default. However, Scale customers can override the
prompt_template
via this variable. The prompt_template
is in the form of an
Apache Velocity template. For more details on how to configure the
prompt_template
, see the long-form documentation.
See pricing for more details on becoming a Scale customer.
This property is deprecated in favor of clearer naming. Use prompt_template
. This property will be
ignored if prompt_template
is set.
Controls the length of the generated output.
This is a rough estimate and not a hard limit: the end output can be longer or shorter
than this value. This is generally implemented by including the max_response_characters
in the
prompt, and the LLM's instruction following capability dictates how closely the generated output
is limited.
This is currently a Scale-only feature. See pricing for more details on becoming a Scale customer.
Possible values: [auto
, eng
, deu
, fra
, zho
, kor
, ara
, rus
, tha
, nld
, ita
, por
, spa
, jpn
, pol
, tur
, vie
, ind
, ces
, ukr
, ell
, heb
, fas
, hin
, urd
, swe
, ben
, msa
, ron
]
Default value: auto
Languages that the Vectara platform supports.
model_parameters object
The parameters for the model. These are currently a Scale-only feature. See pricing for more details on becoming a Scale customer. WARNING: This is an experimental feature, and breakable at any point with virtually no notice. It is meant for experimentation to converge on optimal parameters that can then be set in the prompt definitions.
Possible values: >= 1
The maximum number of tokens to be returned by the model.
The sampling temperature to use. Higher values make the output more random, while lower values make it more focused and deterministic.
Higher values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Higher values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
citations object
Style the generator should use when making citations.
Possible values: [none
, numeric
, html
, markdown
]
The citation style to be used in summary. Can be one of:
The URL pattern if the citation_style is set to html
or markdown
.
The pattern can access metadata attributes in the document or part.
e.g. https://my.doc/foo/{doc.id}/{part.id}
The default url_pattern
is an empty string.
The text pattern if the citation_style is set to html
or markdown
.
This pattern sets the href for HTML or the text within []
in markdown,
and defaults to N being the index of result if it is not set.
The default citation style looks like [N](<url_pattern>)
for markdown.
You can use metadata attributes in the text_pattern
. For example,
the pattern {doc.title}
with citation style markdown
would result
in final citation output like [Title](<url_pattern>)
when
the document's metadata includes {"title":"Title"}
.
Default value: true
Enable returning the factual consistency score with query results.
Default value: false
Indicates whether the response should be streamed or not.
Responses
- 200
- 400
- 403
- 404
A response to a query.
- application/json
- text/event-stream
- Schema
- Example (from schema)
Schema
- Array [
- ]
The summary of the search results.
Possible values: [auto
, eng
, deu
, fra
, zho
, kor
, ara
, rus
, tha
, nld
, ita
, por
, spa
, jpn
, pol
, tur
, vie
, ind
, ces
, ukr
, ell
, heb
, fas
, hin
, urd
, swe
, ben
, msa
, ron
]
Default value: auto
Languages that the Vectara platform supports.
search_results object[]
The ranked search results.
The document part altered by the context configuration that matches the query.
The score of the individual result.
part_metadata object
The metadata for the document part.
The metadata for the document part.
document_metadata object
The metadata for the document that contains the document part.
The metadata for the document that contains the document part.
The ID of the document that contains the document part.
A query request can search over multiple corpora at a time. This property is set to the index in the list of corpora in the original search request that this search result originated from.
If the query request is only over one corpus, this property is 0.
The probability that the summary is factually consistent with the results.
The rendered prompt sent to the LLM. Useful when creating customer prompt_text
templates. Only available
to Scale customers.
{
"summary": "string",
"response_language": "auto",
"search_results": [
{
"text": "string",
"score": 0,
"part_metadata": {},
"document_metadata": {},
"document_id": "string",
"request_corpora_index": 0
}
],
"factual_consistency_score": 0,
"rendered_prompt": "string"
}
- Schema
- Example (from schema)
Schema
- StreamSearchResponse
- StreamGenerationChunk
- StreamGenerationEnd
- StreamResponseEnd
- FactualConsistencyScore
- GenerationInfo
- StreamError
- Array [
- ]
Default value: search_results
When the streaming event has the search results, the
type will be search_results
.
search_results object[]
The ranked search results.
The document part altered by the context configuration that matches the query.
The score of the individual result.
part_metadata object
The metadata for the document part.
The metadata for the document part.
document_metadata object
The metadata for the document that contains the document part.
The metadata for the document that contains the document part.
The ID of the document that contains the document part.
A query request can search over multiple corpora at a time. This property is set to the index in the list of corpora in the original search request that this search result originated from.
If the query request is only over one corpus, this property is 0.
Default value: generation_chunk
When the streaming event contains the next chunk of generator output, the
type will be generation_chunk
.
Part of the message from the generator. All summary chunks must be appended together in order to get the full summary.
Default value: generation_end
Then end of generation will be denoted with an object
with the type generation_end
.
Default value: end
Then end of stream will be denoted with an object
with the type end
.
Default value: factual_consistency_score
When the streaming event contains the factual consistency score, the
type will be factual_consistency_score
.
The probability that the summary is factually consistent with the results.
Default value: generation_info
When the streaming event contains the generation information
type will be generation_info
.
The rendered prompt sent to the LLM. Useful when creating customer prompt_text
templates. Only available
to Scale customers.
If you are on the Scale plan, you can view the actual query made to backend that was rephrased by the LLM from the input query.
Default value: error
If the stream errors, an event with type error
will
be sent.
The error messages.
{}
Query request was malformed.
- application/json
- Schema
- Example (from schema)
Schema
field_errors object
The errors that relate to specific fields in the request.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"field_errors": {},
"messages": [
"string"
],
"request_id": "string"
}
Permissions do not allow querying the corpus.
- application/json
- Schema
- Example (from schema)
Schema
The messages describing why the error occurred.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"messages": [
"Internal server error."
],
"request_id": "string"
}
Corpus not found.
- application/json
- Schema
- Example (from schema)
Schema
The ID cannot be found.
ID of the request that can be used to help Vectara support debug what went wrong.
{
"id": "string",
"messages": [
"string"
],
"request_id": "string"
}