Version: 2.0

Upload Files

This guide demonstrates how to upload files (PDFs, DOCX, and more) to a Vectara corpus using the Python SDK. Uploaded files are automatically parsed, chunked, and indexed—making their contents instantly available for search and Retrieval Augmented Generation (RAG).

Use file upload for:

Bulk onboarding of policy docs, technical manuals, invoices, contracts, or research papers
Ingesting new content as soon as it is generated by your business
Processing documents with tables, charts, and structured data

Prerequisites

This guide assumes you have a corpus called my-docs. If you haven't created a corpus yet, follow the Quick Start guide to set up your first corpus.

Upload a basic file

UPLOAD SAMPLE EMPLOYEE HANDBOOK

Code example with python syntax.

Upload a document (employee handbook or policy document) to your corpus. Vectara automatically parses, chunks, and indexes the content for semantic search. No manual processing is required.

The upload.file method corresponds to the HTTP POST /v2/upload_file endpoint. For more details on request and response parameters, see the Upload File REST API.

Key Parameters:

corpus_key: Target corpus identifier where the file will be stored
file: Binary content of the file (read with "rb" mode)
filename: Name of the file being uploaded
metadata: Optional key-value pairs for filtering and categorization

Supported File Types: PDF, DOCX, DOC, TXT, HTML, Markdown

Note

Each file uploaded can be up to 10 MB in size.

To update or overwrite an existing file, you must first delete the document using client.documents.delete() and then re-upload it, as direct updates to content are not supported. The file name is used as the document ID. Attempting to upload a file with the same name but different content will result in a 409 error.

Error Handling:

400 Bad Request: Invalid parameters or unsupported file type
403 Forbidden: Insufficient permissions
404 Not Found: Corpus not found
409 Conflict: Document with same ID exists but different content
413 Payload Too Large: File exceeds size limit

Upload with table extraction

UPLOAD TECHNICAL DOCUMENT WITH TABLE EXTRACTION

Code example with python syntax.

Upload documents containing tables, charts, or structured data with enhanced extraction capabilities. Perfect for technical documentation, API references, or any document with tabular information.

Table Extraction Benefits:

Automatically extracts and indexes table content
Makes tabular data searchable alongside text content
Preserves table structure and relationships
Enables queries about specific data points within tables

Use Cases:

Technical specifications with parameter tables
API documentation with endpoint tables
Research papers with data tables
Configuration guides with settings tables

Note

Each file uploaded can be up to 10 MB in size.

Error Handling:

400 Bad Request: Invalid parameters or unsupported file type
403 Forbidden: Insufficient permissions
404 Not Found: Corpus not found
409 Conflict: Document with same ID exists but different content
413 Payload Too Large: File exceeds size limit

Upload from file object (streaming)

UPLOAD FROM FILE OBJECT (STREAMING)

Code example with python syntax.

Upload files directly from file objects without loading the entire content into memory. This is ideal for streaming scenarios where files are large or come from dynamic sources like cloud storage (e.g., S3 downloads), APIs, or webhooks, avoiding memory overhead.

Streaming Use Cases:

Files downloaded from cloud storage (S3, Google Cloud, etc.)
Content received through APIs or webhooks
Temporary files that don't need local persistence
Batch processing from external systems

Error Handling:

400 Bad Request: Invalid parameters or unsupported file type
403 Forbidden: Insufficient permissions
404 Not Found: Corpus not found
409 Conflict: Document with same ID exists but different content
413 Payload Too Large: File exceeds size limit

Advanced upload with comprehensive metadata

UPLOAD WITH COMPREHENSIVE METADATA AND PROCESSING OPTIONS

Code example with python syntax.

Upload documents with comprehensive metadata to capture important business context. The system automatically processes document structure for precise queries and analysis.

Comprehensive Metadata Benefits:

Better document organization and discovery
Enhanced filtering capabilities in queries
Support for compliance and audit requirements
Improved search relevance through context

Business Document Benefits:

Query by department: "Show all HR policies"
Filter by dates: "Find documents effective after 2025"
Search by classification: "Show internal documents only"
Track versions: "Get the latest version of each handbook"

Error Handling:

400 Bad Request: Invalid parameters or unsupported file type
403 Forbidden: Insufficient permissions
404 Not Found: Corpus not found
409 Conflict: Document with same ID exists but different content
413 Payload Too Large: File exceeds size limit

Best practices and error handling

File Upload Best Practices:

PRODUCTION-READY UPLOAD PATTERNS

Code example with python syntax.

Production Guidelines:

Always validate file existence and readability before upload
Include comprehensive metadata for better searchability
Use appropriate chunking strategies based on content type
Enable table extraction for documents with structured data
Implement retry logic for transient failures
Monitor upload success rates and file processing times

Error Handling:

File Issues: Validate file existence, permissions, and size
API Errors: Check corpus permissions and file format support
Network Issues: Implement retry logic with exponential backoff
Large Files: Consider chunked uploads for very large documents

Metadata Recommendations:

Include document type, department, and date information
Add file size and upload timestamp for tracking
Use consistent naming conventions across your organization
Include business-specific fields for filtering and analytics

Next steps

After understanding file uploads:

Query uploaded content: Use Queries to search uploaded documents.
Document management: Use Documents to manage uploaded content.
Chat integration: Build conversational interfaces with uploaded documents using Chats.

Upload a basic file​

UPLOAD SAMPLE EMPLOYEE HANDBOOK

Upload with table extraction​

UPLOAD TECHNICAL DOCUMENT WITH TABLE EXTRACTION

Upload from file object (streaming)​

UPLOAD FROM FILE OBJECT (STREAMING)

Advanced upload with comprehensive metadata​

UPLOAD WITH COMPREHENSIVE METADATA AND PROCESSING OPTIONS

Best practices and error handling​

PRODUCTION-READY UPLOAD PATTERNS

Next steps​

Upload a basic file

Upload with table extraction

Upload from file object (streaming)

Advanced upload with comprehensive metadata

Best practices and error handling

Next steps