Skip to main content
Version: 2.0

Retrieve metadata about a corpus

GET 

/v2/corpora/:corpus_key

Get metadata about a corpus. This operation does not search the corpus contents. Specify the corpus_key to identify the corpus whose metadata you want to retrieve. The corpus_key is created when the corpus is set up, either through the Vectara Console UI or the Create Corpus API. For more information, see Corpus Key Definition.

Request

Path Parameters

    corpus_key CorpusKeyrequired

    Possible values: <= 50 characters, Value must match regular expression [a-zA-Z0-9_\=\-]+$

    The unique key identifying the corpus to retrieve.

Header Parameters

    Request-Timeout integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified seconds or time out.

    Request-Timeout-Millis integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified milliseconds or time out.

Responses

A corpus.

Schema
    id string

    Possible values: Value must match regular expression crp_[0-9]+$

    Vectara ID of the corpus.

    key CorpusKey

    Possible values: <= 50 characters, Value must match regular expression [a-zA-Z0-9_\=\-]+$

    A user-provided key for a corpus.

    name string

    Name for the corpus. This value defaults to the key.

    description string

    Corpus description.

    enabled boolean

    Specifies whether the corpus is enabled or not.

    chat_history_corpus boolean

    Indicates that this corpus does not store documents and stores chats instead.

    queries_are_answers boolean

    Default value: false

    Queries made to this corpus are considered answers, and not questions. This swaps the semantics of the encoder used at query time.

    documents_are_questions boolean

    Default value: false

    Documents inside this corpus are considered questions, and not answers. This swaps the semantics of the encoder used at indexing.

    encoder_id stringdeprecated

    Possible values: Value must match regular expression enc_[0-9]+$

    The encoder used by the corpus. Deprecated: Use encoder_name instead

    encoder_name string

    The encoder used by the corpus, boomerang-2023-q3.

    filter_attributes object[]

    The new filter attributes of the corpus.

  • Array [
  • name stringrequired

    The JSON path of the filter attribute in a document or document part metadata.

    level stringrequired

    Possible values: [document, part]

    Indicates whether this is a document or document part metadata filter.

    description string

    Description of the filter. May be omitted.

    indexed boolean

    Default value: true

    Indicates whether an index should be created for the filter. Creating an index will improve query latency when using the filter.

    type stringrequired

    Possible values: [integer, real_number, text, boolean, list[integer], list[real_number], list[text]]

    The value type of the filter.

  • ]
  • custom_dimensions object[]

    The custom dimensions of all document parts inside the corpus.

  • Array [
  • name stringrequired

    The name of the custom dimension.

    description string

    Description of the custom dimension.

    indexing_default double

    Default value of a custom dimension on a document part if the custom dimension value is not specified when the document part is indexed.

    A value of 0 means that custom dimension is not considered.

    querying_default double

    Default value of a custom dimension for a query if the value of the custom dimension is not specified when querying the corpus.

    A value of 0 means that custom dimension is not considered.

  • ]
  • limits object
    used_docs int64

    The number of documents contained in the corpus.

    used_parts int64

    The number of document parts contained in the corpus.

    used_bytes int64

    NOTE: This field is currently not populated by the system. The number of bytes contained in the corpus. This includes the document metadata, document part metadata, and document contents.

    used_characters int64

    The number of characters contained in the corpus. This includes the document metadata, document part metadata, and document contents.

    max_bytes int64

    NOTE: This field is currently not populated by the system. The maximum number of bytes the corpus can be.

    max_metadata_bytes int64

    The maximum size that metadata can be on documents.

    index_rate int64

    NOTE: This field is currently not populated by the system. The maximum per-second addition of new documents to corpus.

    created_at date-time

    Indicates when the corpus was created.

Loading...