Version: 2.0

Chunking strategies

A chunking strategy controls how Vectara splits a document into parts during indexing. Each part is the unit of retrieval. It is what gets embedded, what the reranker scores, and what an agent ultimately reads. Choosing the right strategy directly affects search precision, reranker signal quality, and the amount of context an agent receives per result.

For API details, see the File Upload API and Index Document API references.

How it works

Vectara applies a chunking strategy at ingest time, before embedding. The strategy set on an upload or indexing request determines the part boundaries for every document in that request.

Configuring chunking with the chunking_strategy field accepts one of two strategy types:

A sentence chunking strategy creates one part per sentence. This is the default when chunking_strategy is not set.
A max-chars chunking strategy accumulates sentences into a part until the part reaches a character limit specified. When a single sentence exceeds the limit, the platform splits that sentence across parts.

A document can be divided into different sections when the structured indexing API is used. For example, separate chapters, headings, or logical topics. A StructuredDocumentSection is a named container that groups related text under a title. Chunking never crosses a section boundary: each section is split into parts independently, so a single part never contains text from two different sections. This preserves the semantic integrity of each section regardless of which chunking strategy you choose.

If you need complete control over part boundaries, for example, when ingesting pre-chunked content from an external pipeline, use the CoreDocument type instead of a chunking strategy. CoreDocument submits parts directly and bypasses automatic splitting entirely.

Sentence chunking and fixed-size character chunking are two of the most widely used approaches in RAG systems. Other strategies common in the broader ecosystem include:

Recursive chunking - Splitting by paragraph, then sentence, then word, until a size target is met
Semantic chunking - Splitting where sentence-level similarity drops
Sliding-window chunking - Fixed-size chunks with token overlap between adjacent chunks
Hierarchical chunking - Indexing the same content at two granularities for context and precision

Vectara's built-in chunking_strategy field supports sentence and max-chars chunking only. To apply a different approach upstream, use CoreDocument to submit the resulting parts directly. Vectara stores and embeds boundaries that are supplied.

The precision-context tradeoff

Chunk size sits at the intersection of two competing pressures that shape retrieval quality.

Embedding precision improves as chunks get smaller. An embedding model produces a vector that represents the meaning of its input. When a chunk covers one coherent idea, that vector is tight and accurate. When a chunk mixes several distinct ideas, the resulting vector sits between them and represents none of them well, diluting the signal the retrieval system uses to rank results.

Context richness improves as chunks get larger. A retrieved part that is too short may not carry enough surrounding text for the reranker to judge its relevance accurately, or for an LLM to generate a useful answer from it. Very short parts, a few words or a truncated sentence, lose the surrounding signal the reranker needs.

These two pressures pull in opposite directions. The right balance depends on the document type and the query patterns:

Factoid and point-lookup queries favor smaller chunks (e.g., What is the deadline for X?). The query targets a specific fact; a tight, precise embedding surfaces it with less noise.
Analytical and synthesis queries favor larger chunks (e.g., Summarize the risks in section 3). The query spans a broader idea; a richer embedding captures the full scope.

Context can also be added at query time without re-indexing, using context_configuration to expand how many sentences appear around each retrieved part. Chunking strategy and context_configuration are complementary levers: chunking sets the semantic boundaries at ingest, and context_configuration widens the window shown to the reader or LLM at query time.

Strategy tradeoffs

Sentence chunking

Sentence chunking produces one part per sentence, giving each part a focused, precise embedding. This precision makes sentence chunking well-suited for content with well-formed sentences where each sentence carries a distinct idea.

The tradeoff is twofold. First, sentence chunking requires semantically rich sentences. Documents that lack clear sentence structure (logs, tables, short records) produce weak or fragmented embeddings. Second, sentence chunking generates more parts per document than max-chars chunking, which increases indexing cost and the size of the search index, affecting retrieval latency at scale.

The benefit of fine-grained parts is that context_configuration can expand each pinpoint sentence match into its surrounding context at query time, giving the reranker or LLM a narrow hit with a wide context window attached, without bloating the part itself. This is especially valuable when surfacing snippets in a UI; for LLM consumption alone the distinction matters less.

Max-chars chunking

Max-chars chunking accumulates sentences up to a character limit, producing richer per-part context. The embedding model has more material to work with, which helps when individual sentences are too short to carry meaning on their own.

The tradeoff is dilution risk. If a chunk accumulates sentences across topic boundaries, because the character limit spans a natural transition in the text, the resulting embedding cannot represent any single idea precisely. For structured documents, this risk is mitigated because chunking never crosses section boundaries. For unstructured text, keeping max_chars_per_chunk small enough that a single chunk is unlikely to span multiple distinct topics. A commonly used range of 512–1024 characters (roughly 3–7 sentences in English prose) balances context richness against dilution for most prose.

Max-chars chunking also produces fewer parts per document, which reduces indexing cost and speeds up retrieval.

Core document

CoreDocument bypasses automatic chunking entirely. You supply the part boundaries directly by providing document_parts in the request. This is useful when the semantic structure of content is known and precise control is required over what gets embedded as a unit. For example, content chunked by an upstream pipeline.

When to use each strategy

Content type	Recommended strategy	Why
General prose (articles, reports, documentation)	`sentence_chunking_strategy`	Sentences carry one idea each; precise embeddings improve factoid retrieval
Long documents where sentences are short or fragmentary	`max_chars_chunking_strategy` at 512–1024 chars	Accumulating sentences gives the embedding model enough context per part
Dense, analytical content where queries span paragraphs	`max_chars_chunking_strategy` at 768–1024 chars	Larger parts align better with broad analytical queries
Pre-chunked content from an external pipeline	`CoreDocument`	Preserves your existing segmentation; no re-splitting
Content with known semantic boundaries (chapters, topics)	Structured indexing with `StructuredDocumentSection`	Section boundaries are respected by both chunking strategies; no content bleeds between sections

Using `context_configuration`

If the goal is to return more surrounding text with each retrieved part, so the reranker or LLM has more context to work with, reach for context_configuration at query time rather than increasing chunk size at ingest.

Increasing chunk size to get more context embeds the extra text into the part's vector, which risks diluting the embedding signal. context_configuration adds surrounding sentences to the result without affecting the embedding. The retrieved part stays precise; the reader or LLM gets the wider window.

Use context_configuration when:

Your parts are already well-sized but the LLM needs a few sentences of surrounding text to interpret each result.
You want to tune context width per query or per use case without re-indexing the corpus.
You are experimenting with context window sizes and want to iterate without re-uploading documents.

See Tune retrieval for agents for context_configuration configuration details.

Key terms

Term	Definition
`chunking_strategy`	Optional field on `UploadFileRequest` and `StructuredDocument` that selects how the platform splits the document into parts
`sentence_chunking_strategy`	Default strategy; creates one part per sentence
`max_chars_chunking_strategy`	Strategy that accumulates sentences into a part until `max_chars_per_chunk` is reached
`max_chars_per_chunk`	Integer (minimum 100) specifying the character limit per part when using `max_chars_chunking_strategy`
Part	The unit of storage, embedding, and retrieval; one chunk after indexing
`CoreDocument`	Document type that accepts pre-defined parts directly, bypassing automatic chunking
`StructuredDocumentSection`	A named container that groups related text in a structured document; chunking never crosses a section boundary
`context_configuration`	Query-time setting that expands how many surrounding sentences are returned with each retrieved part, without affecting the embedding

Defaults and limits

Setting	Default	Limit	Notes
`chunking_strategy`	`sentence_chunking_strategy`	—	If omitted, the platform uses one part per sentence
`max_chars_per_chunk`	—	Minimum: 100	Required when `type` is `max_chars_chunking_strategy`; no documented maximum

Configuring chunking

File Upload API

Set chunking_strategy as part of the multipart form upload request:

curl -X POST https://api.vectara.io/v2/corpora/{corpus_key}/upload_file \
  -H "x-api-key: $VECTARA_API_KEY" \
  -F "file=@document.pdf" \
  -F 'metadata={"chunking_strategy": {"type": "max_chars_chunking_strategy", "max_chars_per_chunk": 512}}'

View File Upload API reference

Index Document API (structured document)

Set chunking_strategy on the StructuredDocument object. Each section in the document is chunked independently:

curl -X POST https://api.vectara.io/v2/corpora/{corpus_key}/documents \
  -H "x-api-key: $VECTARA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "structured",
    "id": "doc-001",
    "sections": [
      {
        "id": 1,
        "title": "Introduction",
        "text": "Your document text goes here."
      }
    ],
    "chunking_strategy": {
      "type": "max_chars_chunking_strategy",
      "max_chars_per_chunk": 512
    }
  }'

View Index Document API reference

Index Document API (core document)

Use CoreDocument when you want to supply part boundaries directly:

curl -X POST https://api.vectara.io/v2/corpora/{corpus_key}/documents \
  -H "x-api-key: $VECTARA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "core",
    "id": "doc-002",
    "document_parts": [
      {
        "text": "First pre-chunked part."
      },
      {
        "text": "Second pre-chunked part."
      }
    ]
  }'

View Index Document API reference

How it works​

The precision-context tradeoff​

Strategy tradeoffs​

Sentence chunking​

Max-chars chunking​

Core document​

When to use each strategy​

Using context_configuration​

Key terms​

Defaults and limits​

Configuring chunking​

File Upload API​

Index Document API (structured document)​

Index Document API (core document)​

Related guides​