Skip to main content
Version: 2.0

Knowledge

The agent's knowledge is everything you want it to know up front: manuals, KB articles, support tickets, datasheets, policies, code repos, forum posts, ticketing history, the long tail of company documents. The classic RAG corpus. Pre-loaded by ingestion pipelines, queried at every turn, cited at generation time.

Vectara is RAG at the foundation. Boomerang is the multilingual embedding model. SmartChunk is Vectara's semantic chunker that splits documents on sentence and section boundaries rather than fixed windows, so retrieved chunks preserve meaning. Slingshot reranks candidates. HHEM grades every answer for factual consistency in under 50ms. The whole pipeline is tunable, measurable, and replaceable, and it is the same retrieval engine your agent uses for memory and scratchpad.

What "knowledge" means

The knowledge corpus is shared, static-ish, and read by every agent session.

Use a knowledge corpus for shared source material that should be available across sessions and users. Use memory for user- or tenant-specific facts learned over time. Use scratchpad storage for derived facts, cached tool results, or intermediate outputs that future turns may reuse. Three properties separate it from memory and scratchpad:

PropertyKnowledge corpusMemory corpusScratchpad corpus
Who writes itIngestion pipelinesThe agent (on user signals)The agent (during tool execution)
Who reads itEvery sessionOnly the user it was indexed forThe agent in later turns
Update cadenceScheduled (cron / interval / webhook)On agent decisionPer-call

Pick the mode per corpus. Most production deployments end up with a few of each.

The retrieval pipeline

Vectara is not a one-shot vector lookup. Every retrieval call runs a six-stage pipeline, each stage tunable, each stage observable.

Documents → SmartChunk → Boomerang → Hybrid (BM25 + dense + filters)
→ Slingshot reranker → Citations → Generation-ready context
StageWhat it doesWhy it matters
SmartChunkSemantic chunking that respects breakpoint boundaries: sentences, sections, code blocks.Fixed-window chunking can split related context across chunks, which can reduce retrieval precision and answer quality. SmartChunk preserves meaning.
BoomerangVectara's multilingual embedding model. Leads XQuAD-R cross-lingual at 76.2%, ahead of jina-m0, mxbai, and Qwen3, with a substantial lead on low-resource languages.Out-of-the-box quality in 100+ languages. No fine-tuning required.
Hybrid searchBM25 + dense retrieval, blended at query time, with metadata filters applied before generation.Lexical catches exact terms (model numbers, error codes). Dense catches paraphrase. Filters narrow the search space before scoring.
Slingshot rerankerChain rerankers (knee, MMR, multilingual cross-encoder, UDF) composed per workload.One reranker rarely fits all workloads. Chains let you balance recall, diversity, recency, and freshness.
CitationsEvery retrieved chunk travels with its source document, offset, and score.The generated answer points back at the document the LLM grounded on. Citations let users inspect the retrieved sources used to ground the answer.

For deep configuration, see Hybrid search, Reranking, Citations, and Chunking strategies.

Loading knowledge

Three paths get documents into a corpus.

Pipelines

A pipeline pulls records from a source on a schedule and fans every record into a fresh agent session for verification and indexing. The agent runtime is the unit of execution: same primitives as your live agents, used to index.

Built-in sources include S3, SharePoint, web crawl, Salesforce, Slack, Notion, Google Drive, GitHub, and custom HTTP. Triggers can be cron, interval, manual, or webhook. Watermarked incremental refresh and a dead-letter queue come with the runtime.

POST /v2/pipelines
{
"name": "company-kb-nightly",
"source": { "type": "google_drive", "folder_id": "..." },
"trigger": { "type": "cron", "expression": "0 2 * * *" },
"judge_agent_key": "kb-judge",
"target_corpus_key": "company-kb",
"refresh_mode": "incremental"
}

A judge agent can verify each record before commit (does it have a title? Is it in scope? Should it be redacted?). Failed records land in the DLQ for inspection, replay, or manual fix.

See Pipelines quickstart and Pipeline concepts.

Direct upload

For one-off ingestion or testing, upload a file straight to a corpus:

curl -X POST "https://api.vectara.io/v2/corpora/{corpus_key}/upload_file" \
-H "x-api-key: $VECTARA_API_KEY" \
-F "file=@./onboarding-guide.pdf"

PDF, HTML, DOCX, Excel, Markdown, plain text: all extracted, chunked, embedded. See Data ingestion.

API indexing

For programmatic ingestion (Kafka consumer, custom crawler, in-process hook), call core_document_index directly from your code. The agent can also call it as a tool, which is how scratchpad caching works. See Resource addressing.

Querying knowledge from an agent

The simplest agent tool that queries the corpus:

"tool_configurations": {
"kb_search": {
"type": "corpora_search",
"query_configuration": {
"corpus_key": "company-kb"
}
}
}

A more typical production wiring layers filters, reranking, and multi-corpus search:

"tool_configurations": {
"kb_search": {
"type": "corpora_search",
"query_configuration": {
"search": {
"corpora": [
{ "corpus_key": "company-kb", "lexical_interpolation": 0.05 },
{ "corpus_key": "support-tickets", "lexical_interpolation": 0.1 }
],
"metadata_filter": "doc.product = '$session.metadata.product$'",
"reranker": {
"type": "chain",
"rerankers": [
{ "type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"limit": 50 },
{ "type": "mmr", "diversity_bias": 0.3, "limit": 10 }
]
},
"limit": 10
}
}
}
}

This single tool gives the agent:

  • Hybrid search blended at 5% lexical for docs, 10% for tickets.
  • Filtering by the session's current product (multi-tenant).
  • Multilingual reranking pass to 50 chunks.
  • MMR re-rank to 10 diverse final chunks.
  • Citations on every result.

Drop the filter to do company-wide search. Swap the corpus list to add sources. Same shape every time.

For tuning, see Tune retrieval and Tune retrieval for agents.

Multilingual by default

Boomerang is a single multilingual encoder, not a fork-per-language deployment. The same corpus serves English, Japanese, German, and low-resource languages with the same configuration. Cross-lingual queries work out of the box: a Japanese query can match an English document if the meaning aligns.

On XQuAD-R, the cross-lingual retrieval benchmark, Boomerang + Slingshot scores 76.2% on the cross-lingual average, ahead of jina-m0, mxbai, and Qwen3, with the lead widening on low-resource language pairs (for example, +21.6 points over the next platform on the English↔Telugu pair).

For applications that span markets, this is one configuration choice instead of a deployment fork per locale.

Filtering at scale

Metadata filters narrow the candidate set before dense scoring. That's how Vectara survives corpora of billions of tokens.

"metadata_filter": "doc.region = 'EU' AND doc.indexed_at > 1700000000 AND doc.classification = 'public'"

Filter attributes are declared on the corpus. Choose them deliberately: too few and you over-fetch, too many and you fragment storage.

Common filterWhy
doc_typeSeparate manuals from KB from forum posts
product, versionMulti-product, multi-version retrieval
region, localeRegion-specific compliance and language
classificationPublic vs internal vs confidential
indexed_atRecency boost or freshness filter
customer_id / tenant_idHard tenant boundary

See Metadata filters.

Access control at retrieval

Access control works at two levels: corpus access and result-level filtering. The distinction matters because indexed data is retrieved from Vectara, not from the original source system.

Corpus-level RBAC is enforced by Vectara. A query is rejected unless the requesting identity has permission to query every corpus targeted by the request. This controls which corpora an identity can access.

Finer-grained access inside a corpus is enforced through filters you provide. Vectara does not natively inherit per-document ACLs from the source system. After data is indexed, any identity with query permission on the corpus can retrieve matching documents unless the request narrows the result set. To preserve user-specific access boundaries, attach access metadata to each document at index time, then pass a metadata_filter on every query. Build this filter server-side from the user’s verified attributes, not from client-supplied values. If you omit the filter, the full corpus is in scope for retrieval.

Vectara applies the supplied filter before dense scoring, reranking, and generation. Passages that do not match the filter are excluded before they can reach the reranker or the LLM. The platform enforces the filter you provide; the calling application is responsible for constructing the correct filter and keeping indexed access metadata synchronized with the source system.

See role-based access control and attribute-based access control.

Why use Vectara instead of building RAG yourself

Vectara gives you production retrieval, grounding, evaluation, and governance primitives without requiring your team to assemble and maintain each layer independently.

What Vectara providesWhy it matters
Semantic chunkingPreserves meaningful context boundaries better than fixed-window chunking, which can split related ideas across chunks.
Multilingual retrievalSupports retrieval across languages without requiring you to maintain separate embedding infrastructure.
Hybrid lexical and dense searchCombines keyword precision with semantic recall, covering cases either method can miss on its own.
Reranking controlsLets you tune retrieval quality with reranking strategies such as knee, MMR, and UDF-based controls.
CitationsGives users and reviewers source-level evidence for generated answers.
Corpus-level RBAC and pre-retrieval metadata filtersEnforces corpus access at the platform layer and applies supplied ABAC filters before scoring, reranking, and generation.
Production ingestion pipelinesSupports operational ingestion patterns such as watermarks, judges, and dead-letter queues.
HHEM answer gradingProvides a signal for evaluating and tuning answer quality.

This lets your application focus on the user experience, workflow design, identity handoff, and product surface while Vectara handles the retrieval, grounding, orchestration, evaluation, and governance layers underneath.