Add a document to a corpus
POST/v2/corpora/:corpus_key/documents
Add a document to a corpus for indexing, making its content available for search, retrieval, and generation. This endpoint supports two ingestion modes—structured and core—offering different levels of control over document structure and chunking.
Each document becomes part of a corpus: a logical collection of related data that powers semantic search and Retrieval Augmented Generation (RAG).
You can create documents directly using this API, or use higher-level ingestion frameworks such as Vectara Ingest or the File Upload API for simplified workflows.
Structured Documents
Structured documents provide a natural, human-readable hierarchy where the Vectara platform automatically handles chunking and metadata association.
They are ideal when you want to index documents that have logical organization (titles, sections, paragraphs, and optionally tables or images) but prefer Vectara to manage how the content is split into search-optimized units.
Each structured document contains:
- A unique
idand optionaltitle,description, andmetadata. - An array of
sections, each with its own title, text, and optional nested sections, tables, or images. - Optional
custom_dimensionsthat can influence ranking during search.
When indexed, Vectara’s internal algorithms automatically partition the text into document parts using an intelligent sentence- or character-based chunking strategy. This lets you ingest data with minimal pre-processing while maintaining semantic integrity across context boundaries.
Structured documents are recommended for:
- Most general ingestion scenarios.
- Content with well-defined sections such as reports, articles, FAQs, or documentation.
- Workflows that don’t require full manual control of chunk creation.
Core Documents
Core documents offer fine-grained, explicit control of every part of a document that becomes searchable.
Instead of providing a hierarchical structure, you specify each document part directly as an atomic unit that maps 1:1 to a search result or embedding.
A core document includes:
- A unique
idand optionalmetadata. - A list of
document_parts, where each part includestext, optionalcontext,metadata, andcustom_dimensions. - Optional
tablesandimages, allowing you to represent complex structured data like spreadsheets or charts.
Core documents are designed for advanced use cases such as:
- Precise chunk-level optimization or experimental corpus structures.
- Applications where metadata-driven retrieval or ranking must be explicitly controlled.
- Integrations that predefine their own content segmentation or chunking pipelines.
Chunking Strategies
By default, Vectara uses sentence-based chunking, which provides optimal retrieval accuracy for most datasets.
For larger documents or performance-tuned ingestion, you can explicitly set a chunking_strategy:
sentence_chunking_strategy— creates one chunk per sentence (default).max_chars_chunking_strategy— creates larger chunks up to a specified character limit (max_chars_per_chunk), balancing retrieval speed with contextual coherence.
Response and Usage
Upon successful ingestion, the response includes a status message, any applicable storage quota metrics, and extraction usage statistics.
Once indexed, the document’s content becomes available for querying using the Query APIs.
Request
Responses
- 201
- 400
- 403
- 404
- 409
Document added to the corpus.
Document creation request was malformed.
Permissions do not allow adding a document to the corpus.
Corpus not found.
The document already exists