Skip to main content
Version: 2.0

Knowledge

The agent's knowledge is everything you want it to know up front: manuals, KB articles, support tickets, datasheets, policies, code repos, forum posts, ticketing history, the long tail of company documents. The classic RAG corpus. Pre-loaded by ingestion pipelines, queried at every turn, cited at generation time.

Vectara is RAG at the foundation. Boomerang is the multilingual embedding model. SmartChunk is Vectara's semantic chunker — splits documents on sentence and section boundaries rather than fixed windows, so retrieved chunks preserve meaning. Slingshot reranks candidates. HHEM grades every answer for factual consistency in under 50ms. The whole pipeline is tunable, measurable, and replaceable — and it is the same retrieval engine your agent uses for memory and scratchpad.

What "knowledge" means

The knowledge corpus is shared, static-ish, and read by every agent session. Three properties separate it from memory and scratchpad:

PropertyKnowledge corpusMemory corpusScratchpad corpus
Who writes itIngestion pipelinesThe agent (on user signals)The agent (during tool execution)
Who reads itEvery sessionOnly the user it was indexed forThe agent in later turns
Update cadenceScheduled (cron / interval / webhook)On agent decisionPer-call

Pick the mode per corpus. Most production deployments end up with a few of each.

The retrieval pipeline

Vectara is not a one-shot vector lookup. Every retrieval call runs a six-stage pipeline, each stage tunable, each stage observable.

Documents → SmartChunk → Boomerang → Hybrid (BM25 + dense + filters)
→ Slingshot reranker → Citations → Generation-ready context
StageWhat it doesWhy it matters
SmartChunkSemantic chunking that respects breakpoint boundaries — sentences, sections, code blocks.Naive fixed-window chunking splits ideas across chunks and tanks retrieval quality. SmartChunk preserves meaning.
BoomerangVectara's multilingual embedding model. Leads XQuAD-R cross-lingual at 76.2% — ahead of jina-m0, mxbai, and Qwen3, with a substantial lead on low-resource languages.Out-of-the-box quality in 100+ languages. No fine-tuning required.
Hybrid searchBM25 + dense retrieval, blended at query time, with metadata filters applied before generation.Lexical catches exact terms (model numbers, error codes). Dense catches paraphrase. Filters narrow the search space before scoring.
Slingshot rerankerChain rerankers — knee, MMR, multilingual cross-encoder, UDF — composed per workload.One reranker rarely fits all workloads. Chains let you balance recall, diversity, recency, and freshness.
CitationsEvery retrieved chunk travels with its source document, offset, and score.The generated answer points back at the document the LLM grounded on. No "trust me" hallucinations.

For deep configuration, see Hybrid search, Reranking, Citations, and Chunking strategies.

Loading knowledge

Three paths get documents into a corpus.

Pipelines

A pipeline pulls records from a source on a schedule and fans every record into a fresh agent session for verification and indexing. The agent runtime is the unit of execution — same primitives as your live agents, used to index.

Built-in sources include S3, SharePoint, web crawl, Salesforce, Slack, Notion, Google Drive, GitHub, and custom HTTP. Triggers can be cron, interval, manual, or webhook. Watermarked incremental refresh and a dead-letter queue come with the runtime.

POST /v2/pipelines
{
"name": "company-kb-nightly",
"source": { "type": "google_drive", "folder_id": "..." },
"trigger": { "type": "cron", "expression": "0 2 * * *" },
"judge_agent_key": "kb-judge",
"target_corpus_key": "company-kb",
"refresh_mode": "incremental"
}

A judge agent can verify each record before commit (does it have a title? Is it in scope? Should it be redacted?). Failed records land in the DLQ for inspection, replay, or manual fix.

See Pipelines quickstart and Pipeline concepts.

Direct upload

For one-off ingestion or testing, upload a file straight to a corpus:

curl -X POST "https://api.vectara.io/v2/corpora/{corpus_key}/upload_file" \
-H "x-api-key: $VECTARA_API_KEY" \
-F "file=@./onboarding-guide.pdf"

PDF, HTML, DOCX, Excel, Markdown, plain text — all extracted, chunked, embedded. See Data ingestion.

API indexing

For programmatic ingestion (Kafka consumer, custom crawler, in-process hook), call core_document_index directly from your code. The agent can also call it as a tool, which is how scratchpad caching works. See Resource addressing.

Querying knowledge from an agent

The simplest agent tool that queries the corpus:

"tool_configurations": {
"kb_search": {
"type": "corpora_search",
"query_configuration": {
"corpus_key": "company-kb"
}
}
}

A more typical production wiring layers filters, reranking, and multi-corpus search:

"tool_configurations": {
"kb_search": {
"type": "corpora_search",
"query_configuration": {
"search": {
"corpora": [
{ "corpus_key": "company-kb", "lexical_interpolation": 0.05 },
{ "corpus_key": "support-tickets", "lexical_interpolation": 0.1 }
],
"metadata_filter": "doc.product = '$session.metadata.product$'",
"reranker": {
"type": "chain",
"rerankers": [
{ "type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"limit": 50 },
{ "type": "mmr", "diversity_bias": 0.3, "limit": 10 }
]
},
"limit": 10
}
}
}
}

This single tool gives the agent:

  • Hybrid search blended at 5% lexical for docs, 10% for tickets.
  • Filtering by the session's current product (multi-tenant).
  • Multilingual reranking pass to 50 chunks.
  • MMR re-rank to 10 diverse final chunks.
  • Citations on every result.

Drop the filter to do company-wide search. Swap the corpus list to add sources. Same shape every time.

For tuning, see Tune retrieval and Tune retrieval for agents.

Multilingual by default

Boomerang is a single multilingual encoder, not a fork-per-language deployment. The same corpus serves English, Japanese, German, and low-resource languages with the same configuration. Cross-lingual queries work out of the box: a Japanese query can match an English document if the meaning aligns.

On XQuAD-R, the cross-lingual retrieval benchmark, Boomerang + Slingshot scores 76.2% on the cross-lingual average — ahead of jina-m0, mxbai, and Qwen3 — with the lead widening on low-resource language pairs (for example, +21.6 points over the next platform on the English↔Telugu pair).

For applications that span markets, this is one configuration choice instead of a deployment fork per locale.

Filtering at scale

Metadata filters narrow the candidate set before dense scoring. That's how Vectara survives corpora of billions of tokens.

"metadata_filter": "doc.region = 'EU' AND doc.indexed_at > 1700000000 AND doc.classification = 'public'"

Filter attributes are declared on the corpus. Choose them deliberately — too few and you over-fetch, too many and you fragment storage.

Common filterWhy
doc_typeSeparate manuals from KB from forum posts
product, versionMulti-product, multi-version retrieval
region, localeRegion-specific compliance and language
classificationPublic vs internal vs confidential
indexed_atRecency boost or freshness filter
customer_id / tenant_idHard tenant boundary

See Metadata filters.

RBAC is enforced at retrieval

Permissions filter at retrieval time. A user only ever sees the chunks their identity is entitled to see — same engine, no extra plumbing.

Architecturally: any retrieval is filtered by tenant, RBAC, and metadata before reaching the LLM. There is no path for a model to see data the requesting identity cannot.

See Role-based access control and Attribute-based access control.

Why this beats a roll-your-own RAG

What you don't have to buildWhy it matters
A chunker that respects semantic boundariesNaive chunking destroys retrieval quality.
A multilingual embedding modelMost open models are English-first. Boomerang is multilingual-first.
A hybrid (lexical + dense) search blendEither alone misses cases the other catches.
A chain reranker with knee, MMR, UDF supportSingle rerankers rarely fit all workloads.
Citations on every chunkLLMs that cannot show their work are not auditable.
RBAC enforcement at retrievalAdding RBAC after the fact is structurally unsafe.
Pipelines with watermarks, judges, DLQProduction ingestion needs all three.
HHEM grading on every answerWithout grading, you cannot tune retrieval against quality.

This is the platform doing the AI heavy lifting underneath. Your application gets to focus on the user experience, the workflow, and the brand.