Skip to main content
Version: 2.0

Add a document to a corpus

POST 

/v2/corpora/:corpus_key/documents

Supported API Key Type:
Index ServicePersonal

Add a document to a corpus for indexing, making its content available for search, retrieval, and generation. This endpoint supports two ingestion modes: structured documents and core documents. These modes offer different levels of control over document structure and chunking.

Each document becomes part of a corpus. You can use this API directly or with Vectara Ingest or the File Upload API.

Structured documents

Structured documents provide a natural hierarchy where Vectar handles chunking and metadata automatically. Structured documents are ideal when you want to index documents that have logical organization (titles, sections, paragraphs, and optionally tables or images) but prefer Vectara to manage how the content is split into search-optimized units.

Each structured document contains:

  • A unique id and optional title, description, and metadata.
  • An array of sections, each with its own title, text, and optional nested sections, tables, or images.
  • Optional custom_dimensions that can influence ranking during search.

When indexed, Vectara partitions the text into document parts automatically using an intelligent sentence- or character-based chunking strategy. This lets you ingest data with minimal pre-processing while maintaining semantic integrity across context boundaries.

Structured documents are recommended for content with well-defined sections such as reports, articles, FAQs, or documentation.

Core documents

Core documents offer fine-grained, explicit control of every part of a document that becomes searchable. Instead of providing a hierarchical structure, you specify each document part directly as unit that maps 1:1 to a search result or embedding.

A core document includes:

  • A unique id and optional metadata.
  • A list of document_parts, where each part includes text, optional context, metadata, and custom_dimensions.
  • Optional tables and images, allowing you to represent complex structured data like spreadsheets or charts.

Core documents are designed for advanced use cases such as precise chunk-level optimization or experimental corpus structures, and applications where metadata-driven retrieval or ranking must be explicitly controlled.

Chunking strategies

By default, Vectara uses sentence-based chunking, which provides optimal retrieval accuracy for most datasets.

For larger documents or performance-tuned ingestion, you can explicitly set a chunking_strategy:

  • sentence_chunking_strategy — creates one chunk per sentence (default).
  • max_chars_chunking_strategy — creates larger chunks up to a specified character limit (max_chars_per_chunk), balancing retrieval speed with contextual coherence.

Request

Responses

Document added to the corpus.