Skip to main content
Version: 2.0

List corpora

GET 

/v2/corpora

List corpora in the account. The returned corpus objects contain less detail compared to those retrieved the direct corpus retrieval operation.

Request

Query Parameters

    limit int32

    Possible values: >= 1 and <= 100

    Default value: 10

    The maximum number of corpora to return at one time.

    filter string

    A regular expression to filter the corpora by their name or summary.

    page_key string

    Used to retrieve the next page of corpora after the limit has been reached.

Header Parameters

    Request-Timeout integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified seconds or time out.

    Request-Timeout-Millis integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified milliseconds or time out.

Responses

List of corpora.

Schema
    corpora object[]
  • Array [
  • id string

    Possible values: Value must match regular expression crp_[0-9]+$

    Vectara ID of the corpus.

    key CorpusKey

    Possible values: <= 50 characters, Value must match regular expression [a-zA-Z0-9_\=\-]+$

    A user-provided key for a corpus.

    name string

    Name for the corpus. This value defaults to the key.

    description string

    Corpus description.

    enabled boolean

    Specifies whether the corpus is enabled or not.

    chat_history_corpus boolean

    Indicates that this corpus does not store documents and stores chats instead.

    queries_are_answers boolean

    Default value: false

    Queries made to this corpus are considered answers, and not questions. This swaps the semantics of the encoder used at query time.

    documents_are_questions boolean

    Default value: false

    Documents inside this corpus are considered questions, and not answers. This swaps the semantics of the encoder used at indexing.

    encoder_id stringdeprecated

    Possible values: Value must match regular expression enc_[0-9]+$

    The encoder used by the corpus. Deprecated: Use encoder_name instead

    encoder_name string

    The encoder used by the corpus, boomerang-2023-q3.

    filter_attributes object[]

    The new filter attributes of the corpus.

  • Array [
  • name stringrequired

    The JSON path of the filter attribute in a document or document part metadata.

    level stringrequired

    Possible values: [document, part]

    Indicates whether this is a document or document part metadata filter.

    description string

    Description of the filter. May be omitted.

    indexed boolean

    Default value: true

    Indicates whether an index should be created for the filter. Creating an index will improve query latency when using the filter.

    type stringrequired

    Possible values: [integer, real_number, text, boolean, list[integer], list[real_number], list[text]]

    The value type of the filter.

  • ]
  • custom_dimensions object[]

    The custom dimensions of all document parts inside the corpus.

  • Array [
  • name stringrequired

    The name of the custom dimension.

    description string

    Description of the custom dimension.

    indexing_default double

    Default value of a custom dimension on a document part if the custom dimension value is not specified when the document part is indexed.

    A value of 0 means that custom dimension is not considered.

    querying_default double

    Default value of a custom dimension for a query if the value of the custom dimension is not specified when querying the corpus.

    A value of 0 means that custom dimension is not considered.

  • ]
  • limits object
    used_docs int64

    The number of documents contained in the corpus.

    used_parts int64

    The number of document parts contained in the corpus.

    used_bytes int64

    NOTE: This field is currently not populated by the system. The number of bytes contained in the corpus. This includes the document metadata, document part metadata, and document contents.

    used_characters int64

    The number of characters contained in the corpus. This includes the document metadata, document part metadata, and document contents.

    max_bytes int64

    NOTE: This field is currently not populated by the system. The maximum number of bytes the corpus can be.

    max_metadata_bytes int64

    The maximum size that metadata can be on documents.

    index_rate int64

    NOTE: This field is currently not populated by the system. The maximum per-second addition of new documents to corpus.

    created_at date-time

    Indicates when the corpus was created.

  • ]
  • metadata object

    The standard metadata in the response of a list operation.

    page_key string

    When requesting the next page of this list, this is needed as a query parameter.

Loading...