Create a new turn in the chat
POST/v2/chats/:chat_id/turns
Create a new turn in the chat. Each conversation has a series of turn
objects, which are the sequence of message and response pairs that make up the dialog.
Request
Path Parameters
Possible values: Value must match regular expression cht_.+$
The ID of the chat.
Header Parameters
Possible values: >= 1
The API will make a best effort to complete the request in the specified seconds or time out.
Possible values: >= 1
The API will make a best effort to complete the request in the specified milliseconds or time out.
- application/json
Body
- Array [
- ]
- CustomerSpecificReranker
- UserFunctionReranker
- MMRReranker
- ChainReranker
- NoneReranker
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- Reranks all input results according to its algorithm.
- Applies the cutoff, removing any results with scores below the specified threshold.
- Returns the remaining results, sorted by their new scores.
- Reranks all input results according to its algorithm.
- Sorts the reranked results based on their new scores.
- Returns the top N results, where N is the value specified by this limit.
- The
prompt_template
that is rendered and then sent to the LLM. - The LLM used.
model_parameter
s such as temperature.numeric
- Citations formatted as simple numerals: [1], [2] ...none
- Citations removed from text.html
- Citation formatted as a URL like<a href="url_pattern">text_pattern</a>
.markdown
- Formatted as[text_pattern](url_pattern)
.
The chat message or question.
search object
Search parameters to retrieve knowledge for the query.
corpora object[]
Possible values: >= 1
The corpora that you want to search.
custom_dimensions object
The custom dimensions as additional weights.
The filter string used to narrow the search based on metadata attributes. The query against this
corpus will be confined to document parts that match the metadata_filter
. Only metadata fields
set as filter_attributes
on the corpus can be filtered. Filter syntax is similar to
a SQL WHERE clause. See metadata filters documentation
for more information.
Possible values: <= 1
How much to weigh lexical scores compared to the embedding score. 0 means lexical search is not used at all, and 1 means only lexical search is used.
Possible values: [default
, query
, response
]
Default value: default
Indicates whether to consider a query against this corpus as a query or a response.
Possible values: <= 50 characters
, Value must match regular expression [a-zA-Z0-9_\=\-]+$
A user-provided key for a corpus.
Specifies how many results into the result to skip. This is useful for pagination.
Possible values: >= 1
Default value: 10
The maximum number of results returned.
context_configuration object
Configuration on the presentation of each document part in the result set.
The number of characters that are shown before the matching document part.
This is useful to show the context of the document part in the wider document.
Ignored if sentences_before
is set.
Vectara will capture the full sentence that contains the captured characters,
to not lose the meaning caused by a truncated word or sentence.
The number of characters that are shown after the matching document part.
This is useful to show the context of the document part in the wider document.
Ignored if sentences_after
is set.
Vectara will capture the full sentence that contains the captured characters,
to not lose the meaning caused by a truncated word or sentence.
The number of sentences that are shown before the matching document part. This is useful to show the context of the document part in the wider document.
The number of sentences that are shown after the matching document part. This is useful to show the context of the document part in the wider document.
The tag that wraps the document part at the start. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.
The tag that wraps the document part at the end. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.
reranker object
Rerank results of the search. Rerankers are very powerful tools to improve the order of search results.
By default the search will use the most powerful reranker available to the customer's plan.
To disable reranking, set the reranker type
to "none"
.
Default value: customer_reranker
When the type is customer_reranker
, you can specify the reranker_name
of a reranker. reranker_id
is deprecated.
The retrieval engine will then rerank results using that reranker.
Possible values: Value must match regular expression rnk_(?!272725718)\d+
The ID of the reranker. The multilingual reranker that may be used by Scale customers is rnk_272725719.
Do not specify the MMR reranker ID here, and instead, use the MMR reranker object type.
Deprecated: Use reranker_name
instead.
The name of the reranker. Do not specify the MMR reranker name here. Instead, use the MMR reranker object type.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: userfn
When the type is userfn
, you can define custom reranking functions using document-level metadata,
part-level metadata, or scores generated from the request-level metadata.
The user defined function.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: mmr
When the type is mmr
, you can specify the diversity_bias
, and the
retrieval engine will use the MMR reranker.
The diversity bias. Higher values indicate more diversity.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:
Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.
Default value: chain
When the type is chain
, you can then chain re-rankers together.
Possible values: <= 50
Specify an array of rerankers to apply to search results consecutively.
Default value: none
When the type is none
, no reranking will be done.
Possible values: >= 1
Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:
Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.
generation object
The parameters to control generation.
Possible values: non-empty
The preset values to use to feed the query results and other context to the model.
A generation_preset
is an object with a bundle of properties that specifies:
All of these properties except the model can be overridden by setting them in this
object. Even when a prompt_template
is set, the generation_preset_name
is used to set
the model used.
If generation_preset_name
is not set, the Vectara platform will use the default model and
prompt.
Possible values: non-empty
Use generation_preset_name
instead of prompt_name
.
Default value: 5
The maximum number of search results to be available to the prompt.
Vectara manages both system and user roles and prompts for the generative
LLM out of the box by default. However, Scale customers can override the
prompt_template
via this variable. The prompt_template
is in the form of an
Apache Velocity template. For more details on how to configure the
prompt_template
, see the long-form documentation.
See pricing for more details on becoming a Scale customer.
This property is deprecated in favor of clearer naming. Use prompt_template
. This property will be
ignored if prompt_template
is set.
Controls the length of the generated output.
This is a rough estimate and not a hard limit: the end output can be longer or shorter
than this value. This is generally implemented by including the max_response_characters
in the
prompt, and the LLM's instruction following capability dictates how closely the generated output
is limited.
This is currently a Scale-only feature. See pricing for more details on becoming a Scale customer.
Possible values: [auto
, eng
, deu
, fra
, zho
, kor
, ara
, rus
, tha
, nld
, ita
, por
, spa
, jpn
, pol
, tur
, vie
, ind
, ces
, ukr
, ell
, heb
, fas
, hin
, urd
, swe
, ben
, msa
, ron
]
Default value: auto
Languages that the Vectara platform supports.
model_parameters object
The parameters for the model. These are currently a Scale-only feature. See pricing for more details on becoming a Scale customer. WARNING: This is an experimental feature, and breakable at any point with virtually no notice. It is meant for experimentation to converge on optimal parameters that can then be set in the prompt definitions.
Possible values: >= 1
The maximum number of tokens to be returned by the model.
The sampling temperature to use. Higher values make the output more random, while lower values make it more focused and deterministic.
Higher values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Higher values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
citations object
Style the generator should use when making citations.
Possible values: [none
, numeric
, html
, markdown
]
The citation style to be used in summary. Can be one of:
The URL pattern if the citation_style is set to html
or markdown
.
The pattern can access metadata attributes in the document or part.
e.g. https://my.doc/foo/{doc.id}/{part.id}
The default url_pattern
is an empty string.
The text pattern if the citation_style is set to html
or markdown
.
This pattern sets the href for HTML or the text within []
in markdown,
and defaults to N being the index of result if it is not set.
The default citation style looks like [N](<url_pattern>)
for markdown.
You can use metadata attributes in the text_pattern
. For example,
the pattern {doc.title}
with citation style markdown
would result
in final citation output like [Title](<url_pattern>)
when
the document's metadata includes {"title":"Title"}
.
Default value: true
Enable returning the factual consistency score with query results.
chat object
Parameters to control chat behavior.
Default value: true
Indicates whether to store chat messages and response messages.
Default value: false
Indicates whether the response should be streamed or not.
Responses
- 200
- 400
- 403
- 404
A response to a chat request.
- application/json
- text/event-stream
- Schema
- Example (from schema)
Schema
- Array [
- ]
If the chat response was stored, the ID of the chat.
If the chat response was stored, the ID of the turn.
The message from the chat model for the chat message.
Possible values: [auto
, eng
, deu
, fra
, zho
, kor
, ara
, rus
, tha
, nld
, ita
, por
, spa
, jpn
, pol
, tur
, vie
, ind
, ces
, ukr
, ell
, heb
, fas
, hin
, urd
, swe
, ben
, msa
, ron
]
Default value: auto
Languages that the Vectara platform supports.
search_results object[]
The ranked search results that the chat model used.
The document part altered by the context configuration that matches the query.
The score of the individual result.
part_metadata object
The metadata for the document part.
The metadata for the document part.
document_metadata object
The metadata for the document that contains the document part.
The metadata for the document that contains the document part.
The ID of the document that contains the document part.
A query request can search over multiple corpora at a time. This property is set to the index in the list of corpora in the original search request that this search result originated from.
If the query request is only over one corpus, this property is 0.
The probability that the summary is factually consistent with the results.
The rendered prompt sent to the LLM. Useful when creating customer prompt_text
templates. Only available
to Scale customers.
If you are on the Scale plan, you can view the actual query made to backend that was rephrased by the LLM from the input query.
{
"chat_id": "string",
"turn_id": "string",
"answer": "string",
"response_language": "auto",
"search_results": [
{
"text": "string",
"score": 0,
"part_metadata": {},
"document_metadata": {},
"document_id": "string",
"request_corpora_index": 0
}
],
"factual_consistency_score": 0,
"rendered_prompt": "string",
"rephrased_query": "string"
}
- Schema
- Example (from schema)
Schema
- StreamSearchResponse
- ChatInfoResponse
- StreamGenerationChunk
- StreamGenerationEnd
- FactualConsistencyScore
- StreamResponseEnd
- GenerationInfo
- StreamError
- Array [
- ]
Default value: search_results
When the streaming event has the search results, the
type will be search_results
.
search_results object[]
The ranked search results.
The document part altered by the context configuration that matches the query.
The score of the individual result.
part_metadata object
The metadata for the document part.
The metadata for the document part.
document_metadata object
The metadata for the document that contains the document part.
The metadata for the document that contains the document part.
The ID of the document that contains the document part.
A query request can search over multiple corpora at a time. This property is set to the index in the list of corpora in the original search request that this search result originated from.
If the query request is only over one corpus, this property is 0.
Default value: chat_info
This will be chat_info
when the stream event contains information
about how the chat is stored.
Possible values: Value must match regular expression cht_.+$
ID of the chat.
Possible values: Value must match regular expression trn_.+$
ID of the turn.
Default value: generation_chunk
When the streaming event contains the next chunk of generator output, the
type will be generation_chunk
.
Part of the message from the generator. All summary chunks must be appended together in order to get the full summary.
Default value: generation_end
Then end of generation will be denoted with an object
with the type generation_end
.
Default value: factual_consistency_score
When the streaming event contains the factual consistency score, the
type will be factual_consistency_score
.
The probability that the summary is factually consistent with the results.
Default value: end
Then end of stream will be denoted with an object
with the type end
.
Default value: generation_info
When the streaming event contains the generation information
type will be generation_info
.
The rendered prompt sent to the LLM. Useful when creating customer prompt_text
templates. Only available
to Scale customers.
If you are on the Scale plan, you can view the actual query made to backend that was rephrased by the LLM from the input query.
Default value: error
If the stream errors, an event with type error
will
be sent.
The error messages.
{}
Turn creation request was malformed.
- application/json
- Schema
- Example (from schema)
Schema
field_errors object
The errors that relate to specific fields in the request.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"field_errors": {},
"messages": [
"string"
],
"request_id": "string"
}
Permissions do not allow creating a turn in the chat.
- application/json
- Schema
- Example (from schema)
Schema
The messages describing why the error occurred.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"messages": [
"Internal server error."
],
"request_id": "string"
}
Corpus or chat not found.
- application/json
- Schema
- Example (from schema)
Schema
The ID cannot be found.
ID of the request that can be used to help Vectara support debug what went wrong.
{
"id": "string",
"messages": [
"string"
],
"request_id": "string"
}