FileUpload
POST/v1/upload
The File Upload API can be used to index binary files like PDFs, Word Documents, and similar. Vectara will attempt to automatically extract the text and any metadata from the document like author or title, though you can provide additional metadata as well.
Some tips for this API:
- This operation authenticates with either the Personal API Key, Index API Key, or OAuth 2.0 (in a JWT "Bearer Token"). You can find details of how to set up an API key or use OAuth 2.0 here.
- You can find a full list of supported file formats here.
- To provide additional metadata, set the
doc_metadata
field. You can find some additional details here - PDFs must contain text: Vectara does not currently support indexing scanned images via OCR.
- There is a known issue with the OpenAPI plugin where the generated Python script for file uploads incorrectly uses placeholder values for the file path and filename. Manually replace '/path/to/file' and 'file' in the files array with the actual file path and filename.
Request
Query Parameters
Customer ID
Corpus ID
If true, the server returns the extracted document that was indexed
- multipart/form-data
Body
A JSON string of any additional metadata you want attached to the file.
The file to be indexed into Vectara.
Responses
- 200
- 400
- 401
- 403
- 409
- 507
A successful response
- application/json
- Schema
- Example (from schema)
Schema
response object
quotaConsumed object
The number of characters Vectara indexed from the file uploaded.
The number of metadata characters Vectara indexed from the file uploaded.
{
"response": {
"status": {},
"quotaConsumed": {
"numChars": "string",
"numMetadataChars": "string"
}
}
}
An invalid request was sent. e.g. one or more parameters was missing, or the corpus does not exist.
- application/json
- Schema
- Example (from schema)
Schema
Returned HTTP code
{
"httpCode": 0
}
The request was not authenticated
The caller is not authorized to add documents to the corpus
A document already exists in the corpus with the same document ID, yet the contents of the indexed document are different than the file being uploaded. Since the indexer is idempotent, the same document (identified by the document ID) can be uploaded multiple times. The indexer does not support updates yet, so an error is returned when a different document is uploaded for the same document ID Note that when a raw file is uploaded, the file name is used as the document ID.
There is no more indexing quota left for the corpus or customer to index more documents. Upgrade your account, add a credit card, or contact sales.