Skip to main content
Version: 2.0

Add a document to a corpus

POST 

/v2/corpora/:corpus_key/documents

Add a document to a corpus. This endpoint supports two document formats, structured and core.

  • Structured documents have a more conventional structure that provide document sections and parts in a format created by Vectara's proprietary strategy automatically. You provide a logical document structure, and Vectara handles the partitioning.
  • Core documents differ in that they follow an advanced, granular structure that explicitly defines each document part in an array. Each part becomes a distinct, searchable item in query results. You have precise control over the document structure and content.

For more details, see Indexing.

Request

Path Parameters

    corpus_key CorpusKeyrequired

    Possible values: <= 50 characters, Value must match regular expression [a-zA-Z0-9_\=\-]+$

    The unique key identifying the queried corpus.

Header Parameters

    Request-Timeout integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified seconds or time out.

    Request-Timeout-Millis integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified milliseconds or time out.

Body

    oneOf
    id stringrequired

    The document ID must be unique within the corpus.

    type stringrequired

    Default value: core

    When the type of the indexed document is core the rest of the object is expected to follow this schema. This schema allows precise specification of document chunks that get directly translated to retrieve search results.

    metadata object

    Arbitrary object of document level metadata. Properties of this object can be used by document filters if defined as a corpus filter attribute.

    property name* any

    Arbitrary object of document level metadata. Properties of this object can be used by document filters if defined as a corpus filter attribute.

    tables object[]

    The tables that this document contains.

  • Array [
  • id string

    The unique ID of the table within the document.

    title string

    The title of the table.

    data object

    The data of the table.

    headers array[]

    The headers of the table.

    rows array[]

    The rows in the data.

    description string

    The description of the table.

  • ]
  • document_parts object[]required

    Possible values: >= 1

    Parts of the document that make up the document.

  • Array [
  • text stringrequired

    The text of the document part.

    metadata object

    The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.

    property name* any

    The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.

    table_id string

    The ID of the table that this document part belongs to.

    context string

    The context text for the document part.

    custom_dimensions object

    The custom dimensions as additional weights.

    property name* double
  • ]

Responses

Document added to the corpus.

Schema
    id string

    The document ID.

    metadata object

    The document metadata.

    property name* any

    The document metadata.

    tables object[]

    The tables that this document contains. Tables are not available when table extraction is not enabled.

  • Array [
  • id string

    The unique ID of the table within the document.

    title string

    The title of the table.

    data object

    The data of the table.

    headers array[]

    The headers of the table.

    rows array[]

    The rows in the data.

    description string

    The description of the table.

  • ]
  • parts object[]

    Parts of the document that make up the document. However, parts are not available when retrieving a list of documents or when creating a document. This property is only available when retrieving a document by ID.

  • Array [
  • text stringrequired

    The text of the document part.

    metadata object

    The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.

    property name* any

    The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.

    context string

    The context text for the document part.

    custom_dimensions object

    The custom dimensions as additional weights.

    property name* double
  • ]
  • storage_usage object

    How much storage the document used. This information is currently not returned when retrieving the document, and only returned when indexing a document.

    bytes_used int64

    Number of bytes used by document counting towards maximum corpus size, and towards any billing plans.

    metadata_bytes_used int64

    Number of metadata bytes used by a document.

Loading...