Get a query history
GET/v2/queries/:query_id
Retrieve a detailed history of previously executed query.
Request
Path Parameters
The ID of the query history
Header Parameters
Possible values: >= 1
The API will make a best effort to complete the request in the specified seconds or time out.
Possible values: >= 1
The API will make a best effort to complete the request in the specified milliseconds or time out.
Responses
- 200
- 403
- 404
The query history.
- application/json
- Schema
- Example (from schema)
Schema
- Array [
- ]
- CustomerSpecificReranker
- UserFunctionReranker
- MMRReranker
- ChainReranker
- NoneReranker
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- The
prompt_template
that is rendered and then sent to the LLM. - The LLM used.
model_parameter
s such as temperature.numeric
- Citations formatted as simple numerals: [1], [2] ...none
- Citations removed from text.html
- Citation formatted as a URL like<a href="url_pattern">text_pattern</a>
.markdown
- Formatted as[text_pattern](url_pattern)
.- Array [
- RephraseSpan
- SearchSpan
- RerankSpan
- GenerationSpan
- FactualConsistencyScoreSpan
- Array [
- ]
- Array [
- ]
- ]
The ID of the query history.
query object
Query one or more corpora.
The search query string, which is the question the user is asking.
search object
Search parameters to retrieve knowledge for the query.
corpora object[]
Possible values: >= 1
The corpora that you want to search.
custom_dimensions object
The custom dimensions as additional weights.
The filter string used to narrow the search based on metadata attributes. The query against this
corpus will be confined to document parts that match the metadata_filter
. Only metadata fields
set as filter_attributes
on the corpus can be filtered. Filter syntax is similar to
a SQL WHERE clause. See metadata filters documentation
for more information.
Possible values: <= 1
How much to weigh lexical scores compared to the embedding score. 0 means lexical search is not used at all, and 1 means only lexical search is used.
Possible values: [default
, query
, response
]
Default value: default
Indicates whether to consider a query against this corpus as a query or a response.
Possible values: <= 50 characters
, Value must match regular expression [a-zA-Z0-9_\=\-]+$
A user-provided key for a corpus.
Specifies how many results into the result to skip. This is useful for pagination.
Possible values: >= 1
Default value: 10
The maximum number of results returned.
context_configuration object
Configuration on the presentation of each document part in the result set.
The number of characters that are shown before the matching document part.
This is useful to show the context of the document part in the wider document.
Ignored if sentences_before
is set.
Vectara will capture the full sentence that contains the captured characters,
to not lose the meaning caused by a truncated word or sentence.
The number of characters that are shown after the matching document part.
This is useful to show the context of the document part in the wider document.
Ignored if sentences_after
is set.
Vectara will capture the full sentence that contains the captured characters,
to not lose the meaning caused by a truncated word or sentence.
The number of sentences that are shown before the matching document part. This is useful to show the context of the document part in the wider document.
The number of sentences that are shown after the matching document part. This is useful to show the context of the document part in the wider document.
The tag that wraps the document part at the start. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.
The tag that wraps the document part at the end. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.
reranker object
Rerank results of the search. Rerankers are very powerful tools to improve the order of search results.
By default the search will use the most powerful reranker available to the customer's plan.
To disable reranking, set the reranker type
to "none"
.
Default value: customer_reranker
When the type is customer_reranker
, you can specify the reranker_name
of a reranker. reranker_id
is deprecated.
The retrieval engine will then rerank results using that reranker.
Possible values: Value must match regular expression rnk_(?!272725718)\d+
The ID of the reranker. The multilingual reranker that may be specified is rnk_272725719.
Do not specify the MMR reranker ID here, and instead, use the MMR reranker object type.
Deprecated: Use reranker_name
instead.
The name of the reranker. Do not specify the MMR reranker name here. Instead, use the MMR reranker object type.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: userfn
When the type is userfn
, you can define custom reranking functions using document-level metadata,
part-level metadata, or scores generated from the request-level metadata.
The user defined function.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: mmr
When the type is mmr
, you can specify the diversity_bias
, and the
retrieval engine will use the MMR reranker.
The diversity bias. Higher values indicate more diversity.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: chain
When the type is chain
, you can then chain re-rankers together.
Possible values: <= 50
Specify an array of rerankers to apply to search results consecutively.
Default value: none
When the type is none
, no reranking will be done.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
generation object
The parameters to control generation.
Possible values: non-empty
The preset values to use to feed the query results and other context to the model.
A generation_preset
is an object with a bundle of properties that specifies:
All of these properties except the model can be overridden by setting them in this
object. Even when a prompt_template
is set, the generation_preset_name
is used to set
the model used.
If generation_preset_name
is not set, the Vectara platform will use the default model and
prompt.
Possible values: non-empty
Use generation_preset_name
instead of prompt_name
.
Default value: 5
The maximum number of search results to be available to the prompt.
Vectara manages both system and user roles and prompts for the generative
LLM out of the box by default. However, users can override the
prompt_template
via this variable. The prompt_template
is in the form of an
Apache Velocity template. For more details on how to configure the
prompt_template
, see the long-form documentation.
This property is deprecated in favor of clearer naming. Use prompt_template
. This property will be
ignored if prompt_template
is set.
Controls the length of the generated output.
This is a rough estimate and not a hard limit: the end output can be longer or shorter
than this value. This is generally implemented by including the max_response_characters
in the
prompt, and the LLM's instruction following capability dictates how closely the generated output
is limited.
Possible values: [auto
, eng
, deu
, fra
, zho
, kor
, ara
, rus
, tha
, nld
, ita
, por
, spa
, jpn
, pol
, tur
, vie
, ind
, ces
, ukr
, ell
, heb
, fas
, hin
, urd
, swe
, ben
, msa
, ron
]
Default value: auto
Languages that the Vectara platform supports.
model_parameters object
The parameters for the model. WARNING: This is an experimental feature, and breakable at any point with virtually no notice. It is meant for experimentation to converge on optimal parameters that can then be set in the prompt definitions.
Possible values: >= 1
The maximum number of tokens to be returned by the model.
The sampling temperature to use. Higher values make the output more random, while lower values make it more focused and deterministic.
Higher values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Higher values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
citations object
Style the generator should use when making citations.
Possible values: [none
, numeric
, html
, markdown
]
The citation style to be used in summary. Can be one of:
The URL pattern if the citation_style is set to html
or markdown
.
The pattern can access metadata attributes in the document or part.
e.g. https://my.doc/foo/{doc.id}/{part.id}
The default url_pattern
is an empty string.
The text pattern if the citation_style is set to html
or markdown
.
This pattern sets the href for HTML or the text within []
in markdown,
and defaults to N being the index of result if it is not set.
The default citation style looks like [N](<url_pattern>)
for markdown.
You can use metadata attributes in the text_pattern
. For example,
the pattern {doc.title}
with citation style markdown
would result
in final citation output like [Title](<url_pattern>)
when
the document's metadata includes {"title":"Title"}
.
Default value: true
Enable returning the factual consistency score with query results.
Default value: false
Indicates whether the response should be streamed or not.
Default value: false
Indicates whether to save the query in the query history.
The ID of the chat the query is a part of.
Time taken to complete the query, measured in milliseconds.
ISO date time indicating when the query was first received.
spans object[]
Parts of the query pipeline. Each span explains what happened during that stage of the query pipeline.
This value is always rephrase
.
Time taken in milliseconds.
When the span started.
Query made to the corpora.
This value is always search
.
Time taken in milliseconds.
Indicates when the span started.
search_results object[]
The search results before reranking.
The document part altered by the context configuration that matches the query.
The score of the individual result.
part_metadata object
The metadata for the document part.
The metadata for the document part.
document_metadata object
The metadata for the document that contains the document part.
The metadata for the document that contains the document part.
The ID of the document that contains the document part.
table object
The table that the document part is from.
The unique ID of the table within the document.
The title of the table.
data object
The data of the table.
The headers of the table.
The rows in the data.
The description of the table.
A query request can search over multiple corpora at a time. This property is set to the index in the list of corpora in the original search request that this search result originated from.
If the query request is only over one corpus, this property is 0.
This value is always rerank
.
Time taken in milliseconds.
When the span started.
reranked_search_results object[]
The new search results after reranking.
The document part altered by the context configuration that matches the query.
The score of the individual result.
The original score of the individual result before reranking.
This value is always generation
.
Time taken in milliseconds.
When the span started.
The text sent as a prompt to the LLM.
The text generated from the LLM.
This value is always fcs
.
Time taken in milliseconds.
When the span started.
The probability that the summary is factually consistent with the results.
{
"id": "string",
"query": {
"query": "Am I allowed to bring pets to work?",
"search": {
"corpora": [
{
"custom_dimensions": {},
"metadata_filter": "doc.title = 'Charlotte''s Web'",
"lexical_interpolation": 0.025,
"semantics": "default",
"corpus_key": "my-corpus"
}
],
"offset": 0,
"limit": 10,
"context_configuration": {
"characters_before": 30,
"characters_after": 30,
"sentences_before": 3,
"sentences_after": 3,
"start_tag": "<em>",
"end_tag": "</em>"
},
"reranker": {
"type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"limit": 0,
"cutoff": 0
}
},
"generation": {
"generation_preset_name": "vectara-summary-ext-v1.2.0",
"max_used_search_results": 5,
"prompt_template": "[\n {\"role\": \"system\", \"content\": \"You are a helpful search assistant.\"},\n #foreach ($qResult in $vectaraQueryResults)\n {\"role\": \"user\", \"content\": \"Given the $vectaraIdxWord[$foreach.index] search result.\"},\n {\"role\": \"assistant\", \"content\": \"${qResult.getText()}\" },\n #end\n {\"role\": \"user\", \"content\": \"Generate a summary for the query '${vectaraQuery}' based on the above results.\"}\n]\n",
"max_response_characters": 300,
"response_language": "auto",
"model_parameters": {
"max_tokens": 0,
"temperature": 0,
"frequency_penalty": 0,
"presence_penalty": 0
},
"citations": {
"style": "none",
"url_pattern": "https://vectara.com/documents/{doc.id}",
"text_pattern": "{doc.title}"
},
"enable_factual_consistency_score": true
},
"stream_response": false,
"save_history": false
},
"chat_id": "string",
"latency_millis": 0,
"started_at": "2025-01-14T03:49:21.236Z",
"spans": [
{},
{},
{},
{},
{}
]
}
Permissions do not allow retrieving the query history.
- application/json
- Schema
- Example (from schema)
Schema
The messages describing why the error occurred.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"messages": [
"Internal server error."
],
"request_id": "string"
}
Query history not found.
- application/json
- Schema
- Example (from schema)
Schema
The ID cannot be found.
ID of the request that can be used to help Vectara support debug what went wrong.
{
"id": "string",
"messages": [
"string"
],
"request_id": "string"
}