Knowledge
The agent's knowledge is everything you want it to know up front: manuals, KB articles, support tickets, datasheets, policies, code repos, forum posts, ticketing history, the long tail of company documents. The classic RAG corpus. Pre-loaded by ingestion pipelines, queried at every turn, cited at generation time.
Vectara is RAG at the foundation. Boomerang is the multilingual embedding model. SmartChunk is Vectara's semantic chunker — splits documents on sentence and section boundaries rather than fixed windows, so retrieved chunks preserve meaning. Slingshot reranks candidates. HHEM grades every answer for factual consistency in under 50ms. The whole pipeline is tunable, measurable, and replaceable — and it is the same retrieval engine your agent uses for memory and scratchpad.
What "knowledge" means
The knowledge corpus is shared, static-ish, and read by every agent session. Three properties separate it from memory and scratchpad:
| Property | Knowledge corpus | Memory corpus | Scratchpad corpus |
|---|---|---|---|
| Who writes it | Ingestion pipelines | The agent (on user signals) | The agent (during tool execution) |
| Who reads it | Every session | Only the user it was indexed for | The agent in later turns |
| Update cadence | Scheduled (cron / interval / webhook) | On agent decision | Per-call |
Pick the mode per corpus. Most production deployments end up with a few of each.
The retrieval pipeline
Vectara is not a one-shot vector lookup. Every retrieval call runs a six-stage pipeline, each stage tunable, each stage observable.
Documents → SmartChunk → Boomerang → Hybrid (BM25 + dense + filters)
→ Slingshot reranker → Citations → Generation-ready context
| Stage | What it does | Why it matters |
|---|---|---|
| SmartChunk | Semantic chunking that respects breakpoint boundaries — sentences, sections, code blocks. | Naive fixed-window chunking splits ideas across chunks and tanks retrieval quality. SmartChunk preserves meaning. |
| Boomerang | Vectara's multilingual embedding model. Leads XQuAD-R cross-lingual at 76.2% — ahead of jina-m0, mxbai, and Qwen3, with a substantial lead on low-resource languages. | Out-of-the-box quality in 100+ languages. No fine-tuning required. |
| Hybrid search | BM25 + dense retrieval, blended at query time, with metadata filters applied before generation. | Lexical catches exact terms (model numbers, error codes). Dense catches paraphrase. Filters narrow the search space before scoring. |
| Slingshot reranker | Chain rerankers — knee, MMR, multilingual cross-encoder, UDF — composed per workload. | One reranker rarely fits all workloads. Chains let you balance recall, diversity, recency, and freshness. |
| Citations | Every retrieved chunk travels with its source document, offset, and score. | The generated answer points back at the document the LLM grounded on. No "trust me" hallucinations. |
For deep configuration, see Hybrid search, Reranking, Citations, and Chunking strategies.
Loading knowledge
Three paths get documents into a corpus.
Pipelines
A pipeline pulls records from a source on a schedule and fans every record into a fresh agent session for verification and indexing. The agent runtime is the unit of execution — same primitives as your live agents, used to index.
Built-in sources include S3, SharePoint, web crawl, Salesforce, Slack, Notion, Google Drive, GitHub, and custom HTTP. Triggers can be cron, interval, manual, or webhook. Watermarked incremental refresh and a dead-letter queue come with the runtime.
POST /v2/pipelines
{
"name": "company-kb-nightly",
"source": { "type": "google_drive", "folder_id": "..." },
"trigger": { "type": "cron", "expression": "0 2 * * *" },
"judge_agent_key": "kb-judge",
"target_corpus_key": "company-kb",
"refresh_mode": "incremental"
}
A judge agent can verify each record before commit (does it have a title? Is it in scope? Should it be redacted?). Failed records land in the DLQ for inspection, replay, or manual fix.
See Pipelines quickstart and Pipeline concepts.
Direct upload
For one-off ingestion or testing, upload a file straight to a corpus:
curl -X POST "https://api.vectara.io/v2/corpora/{corpus_key}/upload_file" \
-H "x-api-key: $VECTARA_API_KEY" \
-F "file=@./onboarding-guide.pdf"
PDF, HTML, DOCX, Excel, Markdown, plain text — all extracted, chunked, embedded. See Data ingestion.
API indexing
For programmatic ingestion (Kafka consumer, custom crawler, in-process
hook), call core_document_index directly from your code. The agent can
also call it as a tool, which is how scratchpad caching works. See
Resource addressing.
Querying knowledge from an agent
The simplest agent tool that queries the corpus:
"tool_configurations": {
"kb_search": {
"type": "corpora_search",
"query_configuration": {
"corpus_key": "company-kb"
}
}
}
A more typical production wiring layers filters, reranking, and multi-corpus search:
"tool_configurations": {
"kb_search": {
"type": "corpora_search",
"query_configuration": {
"search": {
"corpora": [
{ "corpus_key": "company-kb", "lexical_interpolation": 0.05 },
{ "corpus_key": "support-tickets", "lexical_interpolation": 0.1 }
],
"metadata_filter": "doc.product = '$session.metadata.product$'",
"reranker": {
"type": "chain",
"rerankers": [
{ "type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"limit": 50 },
{ "type": "mmr", "diversity_bias": 0.3, "limit": 10 }
]
},
"limit": 10
}
}
}
}
This single tool gives the agent:
- Hybrid search blended at 5% lexical for docs, 10% for tickets.
- Filtering by the session's current product (multi-tenant).
- Multilingual reranking pass to 50 chunks.
- MMR re-rank to 10 diverse final chunks.
- Citations on every result.
Drop the filter to do company-wide search. Swap the corpus list to add sources. Same shape every time.
For tuning, see Tune retrieval and Tune retrieval for agents.
Multilingual by default
Boomerang is a single multilingual encoder, not a fork-per-language deployment. The same corpus serves English, Japanese, German, and low-resource languages with the same configuration. Cross-lingual queries work out of the box: a Japanese query can match an English document if the meaning aligns.
On XQuAD-R, the cross-lingual retrieval benchmark, Boomerang + Slingshot scores 76.2% on the cross-lingual average — ahead of jina-m0, mxbai, and Qwen3 — with the lead widening on low-resource language pairs (for example, +21.6 points over the next platform on the English↔Telugu pair).
For applications that span markets, this is one configuration choice instead of a deployment fork per locale.
Filtering at scale
Metadata filters narrow the candidate set before dense scoring. That's how Vectara survives corpora of billions of tokens.
"metadata_filter": "doc.region = 'EU' AND doc.indexed_at > 1700000000 AND doc.classification = 'public'"
Filter attributes are declared on the corpus. Choose them deliberately — too few and you over-fetch, too many and you fragment storage.
| Common filter | Why |
|---|---|
doc_type | Separate manuals from KB from forum posts |
product, version | Multi-product, multi-version retrieval |
region, locale | Region-specific compliance and language |
classification | Public vs internal vs confidential |
indexed_at | Recency boost or freshness filter |
customer_id / tenant_id | Hard tenant boundary |
See Metadata filters.
RBAC is enforced at retrieval
Permissions filter at retrieval time. A user only ever sees the chunks their identity is entitled to see — same engine, no extra plumbing.
Architecturally: any retrieval is filtered by tenant, RBAC, and metadata before reaching the LLM. There is no path for a model to see data the requesting identity cannot.
See Role-based access control and Attribute-based access control.
Why this beats a roll-your-own RAG
| What you don't have to build | Why it matters |
|---|---|
| A chunker that respects semantic boundaries | Naive chunking destroys retrieval quality. |
| A multilingual embedding model | Most open models are English-first. Boomerang is multilingual-first. |
| A hybrid (lexical + dense) search blend | Either alone misses cases the other catches. |
| A chain reranker with knee, MMR, UDF support | Single rerankers rarely fit all workloads. |
| Citations on every chunk | LLMs that cannot show their work are not auditable. |
| RBAC enforcement at retrieval | Adding RBAC after the fact is structurally unsafe. |
| Pipelines with watermarks, judges, DLQ | Production ingestion needs all three. |
| HHEM grading on every answer | Without grading, you cannot tune retrieval against quality. |
This is the platform doing the AI heavy lifting underneath. Your application gets to focus on the user experience, the workflow, and the brand.
Related
- Context & memory — the same primitive serving memory and scratchpad.
- Tune retrieval for agents — per-tool retrieval configuration.
- Hybrid search — the lexical + dense blend.
- Reranking — chain rerankers including MMR and recency UDFs.
- Pipelines quickstart — scheduled ingestion from S3, SharePoint, web, Salesforce, Slack, Notion, Google Drive, GitHub, and more.
- Hallucination evaluation — HHEM grading on every answer.