Upload Files
This guide demonstrates how to upload files (PDFs, DOCX, and more) to a Vectara corpus using the Python SDK. Uploaded files are automatically parsed, chunked, and indexedβmaking their contents instantly available for search and Retrieval Augmented Generation (RAG).
Use file upload for:
- Bulk onboarding of policy docs, technical manuals, invoices, contracts, or research papers
- Ingesting new content as soon as it is generated by your business
- Processing documents with tables, charts, and structured data
This guide assumes you have a corpus called my-docs
. If you haven't created a corpus yet, follow
the Quick Start guide to set up your first corpus.
Upload a basic fileβ
1
Upload a document (employee handbook or policy document) to your corpus. Vectara automatically parses, chunks, and indexes the content for semantic search. No manual processing is required.
The upload.file
method corresponds to the HTTP POST /v2/upload_file
endpoint. For more details on request and response parameters, see the
Upload File REST API.
Key Parameters:
corpus_key
: Target corpus identifier where the file will be storedfile
: Binary content of the file (read with"rb"
mode)filename
: Name of the file being uploadedmetadata
: Optional key-value pairs for filtering and categorization
Supported File Types: PDF, DOCX, DOC, TXT, HTML, Markdown
Each file uploaded can be up to 10 MB in size.
To update or overwrite an existing file, you must first delete the document
using client.documents.delete()
and then re-upload it, as direct updates to
content are not supported. The file name is used as the document ID. Attempting
to upload a file with the same name but different content will result in a 409
error.
Error Handling:
- 400 Bad Request: Invalid parameters or unsupported file type
- 403 Forbidden: Insufficient permissions
- 404 Not Found: Corpus not found
- 409 Conflict: Document with same ID exists but different content
- 413 Payload Too Large: File exceeds size limit
Upload with table extractionβ
1
Upload documents containing tables, charts, or structured data with enhanced extraction capabilities. Perfect for technical documentation, API references, or any document with tabular information.
Table Extraction Benefits:
- Automatically extracts and indexes table content
- Makes tabular data searchable alongside text content
- Preserves table structure and relationships
- Enables queries about specific data points within tables
Use Cases:
- Technical specifications with parameter tables
- API documentation with endpoint tables
- Research papers with data tables
- Configuration guides with settings tables
Each file uploaded can be up to 10 MB in size.
To update or overwrite an existing file, you must first delete the document
using client.documents.delete()
and then re-upload it, as direct updates to
content are not supported. The file name is used as the document ID.
Attempting to upload a file with the same name but different content will
result in a 409 error.
Error Handling:
- 400 Bad Request: Invalid parameters or unsupported file type
- 403 Forbidden: Insufficient permissions
- 404 Not Found: Corpus not found
- 409 Conflict: Document with same ID exists but different content
- 413 Payload Too Large: File exceeds size limit
Upload from file object (streaming)β
1
Upload files directly from file objects without loading the entire content into memory. This is ideal for streaming scenarios where files are large or come from dynamic sources like cloud storage (e.g., S3 downloads), APIs, or webhooks, avoiding memory overhead.
Streaming Use Cases:
- Files downloaded from cloud storage (S3, Google Cloud, etc.)
- Content received through APIs or webhooks
- Temporary files that don't need local persistence
- Batch processing from external systems
Error Handling:
- 400 Bad Request: Invalid parameters or unsupported file type
- 403 Forbidden: Insufficient permissions
- 404 Not Found: Corpus not found
- 409 Conflict: Document with same ID exists but different content
- 413 Payload Too Large: File exceeds size limit
Advanced upload with comprehensive metadataβ
1
Upload documents with comprehensive metadata to capture important business context. The system automatically processes document structure for precise queries and analysis.
Comprehensive Metadata Benefits:
- Better document organization and discovery
- Enhanced filtering capabilities in queries
- Support for compliance and audit requirements
- Improved search relevance through context
Business Document Benefits:
- Query by department: "Show all HR policies"
- Filter by dates: "Find documents effective after 2025"
- Search by classification: "Show internal documents only"
- Track versions: "Get the latest version of each handbook"
To update or overwrite an existing file, you must first delete the document
using client.documents.delete()
and then re-upload it, as direct updates to
content are not supported. The file name is used as the document ID.
Attempting to upload a file with the same name but different content
will result in a 409 error.
Error Handling:
- 400 Bad Request: Invalid parameters or unsupported file type
- 403 Forbidden: Insufficient permissions
- 404 Not Found: Corpus not found
- 409 Conflict: Document with same ID exists but different content
- 413 Payload Too Large: File exceeds size limit
Best practices and error handlingβ
File Upload Best Practices:
1
Production Guidelines:
- Always validate file existence and readability before upload
- Include comprehensive metadata for better searchability
- Use appropriate chunking strategies based on content type
- Enable table extraction for documents with structured data
- Implement retry logic for transient failures
- Monitor upload success rates and file processing times
Error Handling:
- File Issues: Validate file existence, permissions, and size
- API Errors: Check corpus permissions and file format support
- Network Issues: Implement retry logic with exponential backoff
- Large Files: Consider chunked uploads for very large documents
Metadata Recommendations:
- Include document type, department, and date information
- Add file size and upload timestamp for tracking
- Use consistent naming conventions across your organization
- Include business-specific fields for filtering and analytics
Next stepsβ
After understanding file uploads: