Add a document to a corpus
POST/v2/corpora/:corpus_key/documents
Add a document to a corpus for indexing, making its content available for search, retrieval, and generation. This endpoint supports two ingestion modes: structured documents and core documents. These modes offer different levels of control over document structure and chunking.
Each document becomes part of a corpus. You can use this API directly or with Vectara Ingest or the File Upload API.
Structured documents
Structured documents provide a natural hierarchy where Vectar handles chunking and metadata automatically. Structured documents are ideal when you want to index documents that have logical organization (titles, sections, paragraphs, and optionally tables or images) but prefer Vectara to manage how the content is split into search-optimized units.
Each structured document contains:
- A unique
idand optionaltitle,description, andmetadata. - An array of
sections, each with its own title, text, and optional nested sections, tables, or images. - Optional
custom_dimensionsthat can influence ranking during search.
When indexed, Vectara partitions the text into document parts automatically using an intelligent sentence- or character-based chunking strategy. This lets you ingest data with minimal pre-processing while maintaining semantic integrity across context boundaries.
Structured documents are recommended for content with well-defined sections such as reports, articles, FAQs, or documentation.
Core documents
Core documents offer fine-grained, explicit control of every part of a document that becomes searchable. Instead of providing a hierarchical structure, you specify each document part directly as unit that maps 1:1 to a search result or embedding.
A core document includes:
- A unique
idand optionalmetadata. - A list of
document_parts, where each part includestext, optionalcontext,metadata, andcustom_dimensions. - Optional
tablesandimages, allowing you to represent complex structured data like spreadsheets or charts.
Core documents are designed for advanced use cases such as precise chunk-level optimization or experimental corpus structures, and applications where metadata-driven retrieval or ranking must be explicitly controlled.
Chunking strategies
By default, Vectara uses sentence-based chunking, which provides optimal retrieval accuracy for most datasets.
For larger documents or performance-tuned ingestion, you can explicitly set a chunking_strategy:
sentence_chunking_strategy— creates one chunk per sentence (default).max_chars_chunking_strategy— creates larger chunks up to a specified character limit (max_chars_per_chunk), balancing retrieval speed with contextual coherence.
Request
Responses
- 201
- 400
- 403
- 404
- 409
Document added to the corpus.
Document creation request was malformed.
Permissions do not allow adding a document to the corpus.
Corpus not found.
The document already exists