Version: 2.0

Upload a file to the corpus

POST /v2/corpora/:corpus_key/upload_file

Upload files such as PDFs and Word Documents for automatic text extraction and metadata parsing.

The request expects a multipart/form-data format containing the following parts:

metadata - Optionally specifies a JSON object representing any additional metadata to be associated with the extracted document. For example, ''metadata={\"key\": \"value\"};type=application/json''
chunking_strategy - If provided, specifies the chunking strategy for the platform to use. If you do not set this option, the platform uses the default strategy, which creates one chunk per sentence. You can explicitly set sentence chunking with ''chunking_strategy={\"type\":\"sentence_chunking_strategy\"};type=application/json'' or use max chars chunking with ''chunking_strategy={\"type\":\"max_chars_chunking_strategy\",\"max_chars_per_chunk\":200};type=application/json''
table_extraction_config - You can optionally specify whether to extract table data from the uploaded file. If you do not set this option, the platform does not extract tables from PDF files. Example config, ''table_extraction_config={\"extract_tables\":true};type=application/json'' \n* file - Specifies the file that you want to upload. * filename - Specified as part of the file field with the file name that you want to associate with the uploaded file. For a curl example, use the following syntax: ''file=@/path/to/file/file.pdf;filename=desired_filename.pdf''\n\nFor more detailed information, see this File Upload API guide."

Request

The extracted document has been parsed and added to the corpus.