Skip to main content
Version: 2.0

Upload a file to the corpus

POST 

/v2/corpora/:corpus_key/upload_file

Upload files such as PDFs and Word Documents. Vectara will attempt to automatically extract text and any metadata. The File Upload endpoint request expects a multipart/form-data request containing the following parts:

  • metadata - (Optional) Specifies a JSON object representing any additional metadata to be associated with the extracted document. For example, 'metadata={"key": "value"};type=application/json'
  • file - Specifies the file that you want to upload.
  • filename - Specified as part of the file field with the file name that you want to associate with the uploaded file. For a curl example, use the following syntax: 'file=@/path/to/file/file.pdf;filename=desired_filename.pdf'

For more detailed information, see this File Upload API guide.

Request

Path Parameters

    corpus_key CorpusKeyrequired

    Possible values: <= 50 characters, Value must match regular expression [a-zA-Z0-9_\=\-]+$

    The unique key identifying the corpus of which to upload the file.

Header Parameters

    Request-Timeout integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified seconds or time out.

    Request-Timeout-Millis integer

    Possible values: >= 1

    The API will make a best effort to complete the request in the specified milliseconds or time out.

Body

Upload a file for the Vectara platform to attempt to parse and turn into a document within the corpus. The first part of the multipart request can contain any document metadata to attach to the parsed document. Only one document may be uploaded at a time.

    metadata object

    Arbitrary object that will be attached as document metadata to the extracted document.

    property name* any

    Arbitrary object that will be attached as document metadata to the extracted document.

    filename string

    Optional multipart section to override the filename.

    file binaryrequired

    Binary file contents. The file name of the file will be used as the document ID.

Responses

The extracted document has been parsed and added to the corpus.

Schema
    id string

    The document ID.

    metadata object

    The document metadata.

    property name* any

    The document metadata.

    parts object[]

    Parts of the document that make up the document. However, parts are not available when retrieving a list of documents or when creating a document. This property is only available when retrieving a document by ID.

  • Array [
  • text stringrequired

    The text of the document part.

    metadata object

    The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.

    property name* any

    The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.

    context string

    The context text for the document part.

    custom_dimensions object

    The custom dimensions as additional weights.

    property name* double
  • ]
  • storage_usage object

    How much storage the document used. This information is currently not returned when retrieving the document, and only returned when indexing a document.

    bytes_used int64

    Number of bytes used by document counting towards maximum corpus size, and towards any billing plans.

    metadata_bytes_used int64

    Number of metadata bytes used by a document.

Loading...