Skip to main content
Version: 2.0

Query Corpora

POST 

/v2/query

Perform a multi-purpose query that can retrieve relevant information from one or more corpora and generate a response using RAG.

Generation is opt in by setting the generation property. By excluding the property or by setting it to null, the response will not include generation.

For more detailed information please see this api guide.

Request

Body

    query stringrequired

    The query to receive an answer on.

    search object

    Search parameters to retrieve knowledge for the query.

    corpora object[]

    Possible values: >= 1

    The corpora that you want to search.

  • Array [
  • custom_dimensions object

    The custom dimensions as additional weights.

    property name* double
    metadata_filter string

    The filter string to narrow the search to according to metadata attributes. The query against this corpus will be confined to document parts that match the metadata_filter. Only metadata set as filter_attributes on the corpus can be filtered. Filter syntax is similiar to a SQL where clause. See metadata filters documentation for more information.

    lexical_interpolation float

    Possible values: <= 1

    How much to weigh lexical scores compared to the embedding score. 0 means lexical search is not used at all, and 1 means only lexical search is used.

    semantics SearchSemantics

    Possible values: [default, query, response]

    Default value: default

    Indicates whether to consider a query against this corpus as a query or a response.

    corpus_key CorpusKeyrequired

    Possible values: <= 50 characters, Value must match regular expression [a-zA-Z0-9_\=\-]+$

    A user-provided key for a corpus.

  • ]
  • offset int32

    Specifies how many results into the result to skip. This is useful for pagination.

    limit int32

    Possible values: >= 1

    Default value: 10

    The maximum number of results returned.

    context_configuration object

    Configuration on the presentation of each document part in the result set.

    characters_before int32

    The number of characters before the matching document part that are shown. This is useful to show the context of the document part in the wider document. Ignored if sentences_before is set. Vectara will capture the full sentence that contains the captured characters, so as to not lose the meaning caused by a truncated word or sentence.

    characters_after int32

    The number of characters after the matching document part that are shown. This is useful to show the context of the document part in the wider document. Ignored if sentences_after is set. Vectara will capture the full sentence that contains the captured characters, so as to not lose the meaning caused by a truncated word or sentence.

    sentences_before int32

    The number of sentences before the matching document part that are shown. This is useful to show the context of the document part in the wider document.

    sentences_after int32

    The number of sentences after the matching document part that are shown. This is useful to show the context of the document part in the wider document.

    start_tag string

    The tag that wraps the document part at the start. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.

    end_tag string

    The tag that wraps the document part at the end. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.

    reranker object

    Rerank results of the search. Rerankers are very powerful tools to better order search results. By default the search will use the most powerful reranker available to the customer's plan. To disable reranking set the reranker type to "none".

    oneOf
    type string

    Default value: customer_reranker

    When type is is customer_reranker, you can specify the reranker_id of a reranker. The retrieval engine will then rerank results using that reranker.

    reranker_id stringrequired

    Possible values: Value must match regular expression rnk_(?!272725718)\d+

    The ID of the reranker. Current reranker that may be used by Scale customers is rnk_272725719. Do not specify the MMR reranker ID here, and instead use the MMR reranker object type.

    generation object

    The parameters to control generation.

    prompt_name string

    Possible values: non-empty

    The prompt to use to feed the query results and other context to the model.

    A prompt is an object with a bundle of properties that specifies:

    • The prompt_text that is rendered then sent to the LLM.
    • The LLM used.
    • model_parameters such as temperature.

    All of these properties except the model can be overriden by setting them in this object. Even when a prompt_text is set, the prompt_name is used to set the model used.

    If prompt_name is not set the Vectara platform will use the default model and prompt.

    max_used_search_results int32

    Default value: 5

    The maximum number of search results to be available to the prompt.

    prompt_text string

    Vectara manages both system and user roles and prompts for the generative LLM out of the box by default. However, Scale customers can override the prompt_text via this variable. The prompt_text is in the form of an Apache Velocity template. For more details on how to configure the prompt_text, see the long-form documentation at https://docs.vectara.com/docs/prompts/vectara-prompt-engine. See https://vectara.com/pricing/ for more details on becoming a Scale customer.

    max_response_characters int32

    Controls the length of the generated output. This is a rough estimate and not a hard limit: the end output can be longer or shorter than this value. This is generally implemented by including the max_response_characters in the prompt, and the LLM's instruction following capability dictates how closely the generated output is limited.

    So, this value This is currently a Scale-only feature. See https://vectara.com/pricing/ for more details on becoming a Scale customer.

    response_language Language

    Possible values: [auto, eng, deu, fra, zho, kor, ara, rus, tha, nld, ita, por, spa, jpn, pol, tur, vie, ind, ces, ukr, ell, heb, fas, hin, urd, swe, ben, msa, ron]

    Default value: auto

    Languages that the Vectara platform supports.

    model_parameters object

    The parameters for the model. These are currently a Scale-only feature. See https://vectara.com/pricing/ for more details on becoming a Scale customer. WARNING: This is an experimental feature, and breakable at any point with virtually no notice. It is meant for experimentation to converge on optimal parameters that can then be set in the prompt definitions.

    max_tokens int32

    Possible values: >= 1

    The maximum number of tokens to be returned by the model.

    temperature float

    The sampling temperature to use. Higher values make the output more random, while lower values make it more focused and deterministic.

    frequency_penalty float

    Higher values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penalty float

    Higher values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    citations object

    Style the generator should use when making citations.

    style string

    Possible values: [none, numeric, html, markdown]

    The citation style to be used in summary. Can be one of:

    • numeric - Citations formatted as simple numerals: [1], [2] ...
    • none - Citations removed from text.
    • html - Citation formatted as url like <a href="url_pattern">text_pattern</a>.
    • markdown - Formatted as [text_pattern](url_pattern).
    url_pattern string

    The url pattern if the citation_style is set to html or markdown. The pattern can access metadata attributes in the document or part. e.g. https://my.doc/foo/{doc.id}/{part.id}

    The default url_pattern is an empty string.

    text_pattern string

    The text pattern if the citation_style is set to html or markdown. This pattern sets the href for html or the text within [] in markdown, and defaults to N being the index of result if it is not set.

    The default citation style looks like [N](<url_pattern>) for markdown.

    You can use metadata attributes in the text_pattern. For example, the pattern {doc.title} with citation style markdown would result in final citation output like [Title](<url_pattern>) when the document's metadata includes {"title":"Title"}.

    enable_factual_consistency_score boolean

    Default value: true

    Enable returning the factual consistency score with query results.

    stream_response boolean

    Default value: false

    Indicates whether the response should be streamed or not.

Responses

A response to a query.

Schema
    summary string

    The summary of the search results.

    response_language Language

    Possible values: [auto, eng, deu, fra, zho, kor, ara, rus, tha, nld, ita, por, spa, jpn, pol, tur, vie, ind, ces, ukr, ell, heb, fas, hin, urd, swe, ben, msa, ron]

    Default value: auto

    Languages that the Vectara platform supports.

    search_results object[]

    The ranked search results.

  • Array [
  • text string

    The document part altered by the context configuration that matches the query.

    score double

    The score of the individual result.

    part_metadata object

    The metadata for the document part.

    property name* any

    The metadata for the document part.

    document_metadata object

    The metadata for the document that contains the document part.

    property name* any

    The metadata for the document that contains the document part.

    document_id string

    The ID of the document that contains the document part.

    request_corpora_index int32

    A query request can search over multiple corpora at a time. This property is set to the index in the list of corpora in the original search request that this search result originated from.

    If the query request is only over one corpus, this property is 0.

  • ]
  • factual_consistency_score float

    The probability that the summary is factually consistent with the results.

    rendered_prompt string

    The rendered prompt sent to the LLM. Useful when creating customer prompt_text templates. Only available to Scale customers.

Loading...