Upload a file to the corpus
POST/v2/corpora/:corpus_key/upload_file
Upload files such as PDFs and Word Documents for automatic text extraction and metadata parsing.
The request expects a multipart/form-data
format containing the following parts:
metadata
- (Optional) Specifies a JSON object representing any additional metadata to be associated with the extracted document. For example,'metadata={"key": "value"};type=application/json'
chunking_strategy
- (Optional) Specifies the chunking strategy for the platform to use. If you do not set this option, the platform uses the default strategy, which creates one chunk per sentence. You can explicitly set sentence chunking with'chunking_strategy={"type":"sentence_chunking_strategy"};type=application/json'
or use max chars chunking with'chunking_strategy={"type":"max_chars_chunking_strategy","max_chars_per_chunk":200};type=application/json'
table_extraction_config
- (Optional) Specifies whether to extract table data from the uploaded file. If you do not set this option, the platform does not extract tables from PDF files. Example config,'table_extraction_config={"extract_tables":true};type=application/json'
file
- Specifies the file that you want to upload.filename
- Specified as part of the file field with the file name that you want to associate with the uploaded file. For a curl example, use the following syntax:'file=@/path/to/file/file.pdf;filename=desired_filename.pdf'
For more detailed information, see this File Upload API guide.
Request
Responses
- 201
- 400
- 403
- 404
- 415
The extracted document has been parsed and added to the corpus.
Upload file request was malformed.
Permissions do not allow uploading a file to the corpus.
Corpus not found.
The media type of the uploaded file is not supported.