Skip to main content
Version: 2.0

Advanced Single Corpus Query

POST 

/v2/corpora/:corpus_key/query

Query a specific corpus and find relevant results, highlight relevant snippets, and use Retrieval Augmented Generation:

  • Specify the unique corpus_key identifying the corpus to query. The corpus_key is created in the Vectara Console UI or the Create Corpus API definition. When creating a new corpus, you have the option to assign a custom corpus_key following your preferred naming convention. This key serves as a unique identifier for the corpus, allowing it to be referenced in search requests. For more information, see Corpus Key Definition.
  • Customize your search by specifying the query text (query), pagination details (offset and limit), and metadata filters (metadata_filter) to tailor your search results. Learn more
  • Leverage advanced search capabilities like reranking (reranker) and Retrieval Augmented Generation (RAG) (generation) for enhanced query performance. Generation is opt in by setting the generation property. By excluding the property or by setting it to null, the response will not include generation. Learn more.
  • Use hybrid search to achieve optimal results by setting different values for lexical_interpolation (e.g., 0.025). Learn more
  • Specify Vectara's RAG-focused LLM (Mockingbird) for the generation_preset_name. Learn more
  • Use advanced summarization options that utilize detailed summarization parameters such as max_response_characters, temperature, and frequency_penalty for generating precise and relevant summaries. Learn more

For more detailed information, see Query API guide.

Request

Path Parameters

    corpus_key CorpusKeyrequired

    Possible values: <= 50 characters, Value must match regular expression [a-zA-Z0-9_\=\-]+$

    The unique key identifying the corpus to query.

Header Parameters

    Request-Timeout integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified seconds or time out.

    Request-Timeout-Millis integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified milliseconds or time out.

Body

    query stringrequired

    The search query string, which is the question the user is asking.

    search object

    Search parameters to retrieve knowledge for the query.

    custom_dimensions object

    The custom dimensions as additional weights.

    property name* double
    metadata_filter string

    The filter string to narrow the search to according to metadata attributes. The query against this corpus will be confined to document parts that match the metadata_filter. Only metadata set as filter_attributes on the corpus can be filtered. Filter syntax is similiar to a SQL where clause. See metadata filters documentation for more information.

    lexical_interpolation float

    Possible values: <= 1

    How much to weigh lexical scores compared to the embedding score. 0 means lexical search is not used at all, and 1 means only lexical search is used.

    semantics SearchSemantics

    Possible values: [default, query, response]

    Default value: default

    Indicates whether to consider a query against this corpus as a query or a response.

    offset int32

    Specifies how many results into the result to skip. This is useful for pagination.

    limit int32

    Possible values: >= 1

    Default value: 10

    The maximum number of results returned.

    context_configuration object

    Configuration on the presentation of each document part in the result set.

    characters_before int32

    The number of characters that are shown before the matching document part. This is useful to show the context of the document part in the wider document. Ignored if sentences_before is set. Vectara will capture the full sentence that contains the captured characters, to not lose the meaning caused by a truncated word or sentence.

    characters_after int32

    The number of characters that are shown after the matching document part. This is useful to show the context of the document part in the wider document. Ignored if sentences_after is set. Vectara will capture the full sentence that contains the captured characters, to not lose the meaning caused by a truncated word or sentence.

    sentences_before int32

    The number of sentences that are shown before the matching document part. This is useful to show the context of the document part in the wider document.

    sentences_after int32

    The number of sentences that are shown after the matching document part. This is useful to show the context of the document part in the wider document.

    start_tag string

    The tag that wraps the document part at the start. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.

    end_tag string

    The tag that wraps the document part at the end. This is often used to provide a start HTML/XML tag or some other delimiter you can use in an application to understand where to provide highlighting in your UI and understand where the context before ends and the document part begins.

    reranker object

    Rerank results of the search. Rerankers are very powerful tools to better order search results. By default the search will use the most powerful reranker available to the customer's plan. To disable reranking set the reranker type to "none".

    oneOf
    type string

    Default value: customer_reranker

    When type is customer_reranker, you can specify the reranker_name of a reranker. reranker_id is deprecated. The retrieval engine will then rerank results using that reranker.

    reranker_id stringdeprecated

    Possible values: Value must match regular expression rnk_(?!272725718)\d+

    The ID of the reranker. The multilingual reranker that may be used by Scale customers is rnk_272725719. Do not specify the MMR reranker ID here, and instead, use the MMR reranker object type. Deprecated: Use reranker_name instead.

    reranker_name string

    The name of the reranker. Do not specify the MMR reranker name here. Instead, use the MMR reranker object type.

    limit int32

    Possible values: >= 1

    Specifies the maximum number of results to be returned after the reranking process. When a reranker is applied, it performs the following steps:

    1. Reranks all input results according to its algorithm.
    2. Sorts the reranked results based on their new scores.
    3. Returns the top N results, where N is the value specified by this limit.

    Note: This limit is applied per reranking stage. In a chain of rerankers, each reranker can have its own limit, potentially reducing the number of results at each stage.

    cutoff float

    Specifies the minimum score threshold for results to be included after the reranking process. When a reranker is applied with a cutoff, it performs the following steps:

    1. Reranks all input results according to its algorithm.
    2. Applies the cutoff, removing any results with scores below the specified threshold.
    3. Returns the remaining results, sorted by their new scores.

    Note: This cutoff is applied per reranking stage. In a chain of rerankers, each reranker can have its own cutoff, potentially further reducing the number of results at each stage. If both 'limit' and 'cutoff' are specified, the cutoff is applied first, followed by the limit.

    generation object

    The parameters to control generation.

    generation_preset_name string

    Possible values: non-empty

    The preset values to use to feed the query results and other context to the model.

    A generation_preset is an object with a bundle of properties that specifies:

    • The prompt_template that is rendered and then sent to the LLM.
    • The LLM used.
    • model_parameters such as temperature.

    All of these properties except the model can be overriden by setting them in this object. Even when a prompt_template is set, the generation_preset_name is used to set the model used.

    If generation_preset_name is not set, the Vectara platform will use the default model and prompt.

    prompt_name stringdeprecated

    Possible values: non-empty

    Use generation_preset_name instead of prompt_name.

    max_used_search_results int32

    Default value: 5

    The maximum number of search results to be available to the prompt.

    prompt_template string

    Vectara manages both system and user roles and prompts for the generative LLM out of the box by default. However, Scale customers can override the prompt_template via this variable. The prompt_template is in the form of an Apache Velocity template. For more details on how to configure the prompt_template, see the long-form documentation. See pricing for more details on becoming a Scale customer.

    prompt_text stringdeprecated

    This property is deprecated in favor clearer naming. Use prompt_template. This property will be ignored if prompt_template is set.

    max_response_characters int32

    Controls the length of the generated output. This is a rough estimate and not a hard limit: the end output can be longer or shorter than this value. This is generally implemented by including the max_response_characters in the prompt, and the LLM's instruction following capability dictates how closely the generated output is limited.

    This is currently a Scale-only feature. See pricing for more details on becoming a Scale customer.

    response_language Language

    Possible values: [auto, eng, deu, fra, zho, kor, ara, rus, tha, nld, ita, por, spa, jpn, pol, tur, vie, ind, ces, ukr, ell, heb, fas, hin, urd, swe, ben, msa, ron]

    Default value: auto

    Languages that the Vectara platform supports.

    model_parameters object

    The parameters for the model. These are currently a Scale-only feature. See pricing for more details on becoming a Scale customer. WARNING: This is an experimental feature, and breakable at any point with virtually no notice. It is meant for experimentation to converge on optimal parameters that can then be set in the prompt definitions.

    max_tokens int32

    Possible values: >= 1

    The maximum number of tokens to be returned by the model.

    temperature float

    The sampling temperature to use. Higher values make the output more random, while lower values make it more focused and deterministic.

    frequency_penalty float

    Higher values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    presence_penalty float

    Higher values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    citations object

    Style the generator should use when making citations.

    style string

    Possible values: [none, numeric, html, markdown]

    The citation style to be used in summary. Can be one of:

    • numeric - Citations formatted as simple numerals: [1], [2] ...
    • none - Citations removed from text.
    • html - Citation formatted as a URL like <a href="url_pattern">text_pattern</a>.
    • markdown - Formatted as [text_pattern](url_pattern).
    url_pattern string

    The URL pattern if the citation_style is set to html or markdown. The pattern can access metadata attributes in the document or part. e.g. https://my.doc/foo/{doc.id}/{part.id}

    The default url_pattern is an empty string.

    text_pattern string

    The text pattern if the citation_style is set to html or markdown. This pattern sets the href for HTML or the text within [] in markdown, and defaults to N being the index of result if it is not set.

    The default citation style looks like [N](<url_pattern>) for markdown.

    You can use metadata attributes in the text_pattern. For example, the pattern {doc.title} with citation style markdown would result in final citation output like [Title](<url_pattern>) when the document's metadata includes {"title":"Title"}.

    enable_factual_consistency_score boolean

    Default value: true

    Enable returning the factual consistency score with query results.

    stream_response boolean

    Default value: false

    Indicates whether the response should be streamed or not.

Responses

A response to a query.

Schema
    summary string

    The summary of the search results.

    response_language Language

    Possible values: [auto, eng, deu, fra, zho, kor, ara, rus, tha, nld, ita, por, spa, jpn, pol, tur, vie, ind, ces, ukr, ell, heb, fas, hin, urd, swe, ben, msa, ron]

    Default value: auto

    Languages that the Vectara platform supports.

    search_results object[]

    The ranked search results.

  • Array [
  • text string

    The document part altered by the context configuration that matches the query.

    score double

    The score of the individual result.

    part_metadata object

    The metadata for the document part.

    property name* any

    The metadata for the document part.

    document_metadata object

    The metadata for the document that contains the document part.

    property name* any

    The metadata for the document that contains the document part.

    document_id string

    The ID of the document that contains the document part.

    request_corpora_index int32

    A query request can search over multiple corpora at a time. This property is set to the index in the list of corpora in the original search request that this search result originated from.

    If the query request is only over one corpus, this property is 0.

  • ]
  • factual_consistency_score float

    The probability that the summary is factually consistent with the results.

    rendered_prompt string

    The rendered prompt sent to the LLM. Useful when creating customer prompt_text templates. Only available to Scale customers.

Loading...