Version: 2.0

Knowledge

The agent's knowledge is everything you want it to know up front: manuals, KB articles, support tickets, datasheets, policies, code repos, forum posts, ticketing history, the long tail of company documents. The classic RAG corpus. Pre-loaded by ingestion pipelines, queried at every turn, cited at generation time.

Vectara is RAG at the foundation. Boomerang is the multilingual embedding model. Chunking splits documents into parts at sentence boundaries, or accumulates sentences up to a character limit, and never crosses section boundaries, so retrieved parts preserve meaning. Slingshot reranks candidates. HHEM grades every answer for factual consistency in under 50ms. The whole pipeline is tunable, measurable, and replaceable, and it is the same retrieval engine your agent uses for memory and scratchpad.

What "knowledge" means

The knowledge corpus is shared, static-ish, and read by every agent session.

Use a knowledge corpus for shared source material that should be available across sessions and users. Use memory for user- or tenant-specific facts learned over time. Use scratchpad storage for derived facts, cached tool results, or intermediate outputs that future turns may reuse. Three properties separate it from memory and scratchpad:

Property	Knowledge corpus	Memory corpus	Scratchpad corpus
Who writes it	Ingestion pipelines	The agent (on user signals)	The agent (during tool execution)
Who reads it	Every session	Only the user it was indexed for	The agent in later turns
Update cadence	Scheduled (cron / interval / webhook)	On agent decision	Per-call

Pick the mode per corpus. Most production deployments end up with a few of each.

The retrieval pipeline

Vectara is not a one-shot vector lookup. Every retrieval call runs a six-stage pipeline, each stage tunable, each stage observable.

Documents → Chunking → Boomerang → Hybrid (BM25 + dense + filters)
          → Slingshot reranker → Citations → Generation-ready context

Stage	What it does	Why it matters
Chunking	Splits documents into parts at sentence boundaries, or accumulates sentences up to a character limit; never crosses section boundaries.	Chunk size sets the precision–context tradeoff: parts that mix several ideas dilute the embedding, while parts that are too short lose surrounding signal.
Boomerang	Vectara's multilingual embedding model. Leads XQuAD-R cross-lingual at 76.2%, ahead of jina-m0, mxbai, and Qwen3, with a substantial lead on low-resource languages.	Out-of-the-box quality in 100+ languages. No fine-tuning required.
Hybrid search	BM25 + dense retrieval, blended at query time, with metadata filters applied before generation.	Lexical catches exact terms (model numbers, error codes). Dense catches paraphrase. Filters narrow the search space before scoring.
Slingshot reranker	Chain rerankers (knee, MMR, multilingual cross-encoder, UDF) composed per workload.	One reranker rarely fits all workloads. Chains let you balance recall, diversity, recency, and freshness.
Citations	Every retrieved chunk travels with its source document, offset, and score.	The generated answer points back at the document the LLM grounded on. Citations let users inspect the retrieved sources used to ground the answer.

For deep configuration, see Hybrid search, Reranking, Citations, and Chunking strategies.

Loading knowledge

Three paths get documents into a corpus.

Pipelines

A pipeline pulls records from a source on a schedule and fans every record into a fresh agent session for verification and indexing. The agent runtime is the unit of execution: same primitives as your live agents, used to index.

Built-in sources include S3, SharePoint, web crawl, Salesforce, Slack, Notion, Google Drive, GitHub, and custom HTTP. Triggers can be cron, interval, manual, or webhook. Watermarked incremental refresh and a dead-letter queue come with the runtime.

POST /v2/pipelines
{
  "name": "company-kb-nightly",
  "source": { "type": "google_drive", "folder_id": "..." },
  "trigger": { "type": "cron", "expression": "0 2 * * *" },
  "judge_agent_key": "kb-judge",
  "target_corpus_key": "company-kb",
  "refresh_mode": "incremental"
}

A judge agent can verify each record before commit (does it have a title? Is it in scope? Should it be redacted?). Failed records land in the DLQ for inspection, replay, or manual fix.

See Pipelines quickstart and Pipeline concepts.

Direct upload

For one-off ingestion or testing, upload a file straight to a corpus:

curl -X POST "https://api.vectara.io/v2/corpora/{corpus_key}/upload_file" \
  -H "x-api-key: $VECTARA_API_KEY" \
  -F "file=@./onboarding-guide.pdf"

PDF, HTML, DOCX, Excel, Markdown, plain text: all extracted, chunked, embedded. See Data ingestion.

API indexing

For programmatic ingestion (Kafka consumer, custom crawler, in-process hook), call core_document_index directly from your code. The agent can also call it as a tool, which is how scratchpad caching works. See Resource addressing.

Querying knowledge from an agent

The simplest agent tool that queries the corpus:

"tool_configurations": {
  "kb_search": {
    "type": "corpora_search",
    "query_configuration": {
      "corpus_key": "company-kb"
    }
  }
}

A more typical production wiring layers filters, reranking, and multi-corpus search:

"tool_configurations": {
  "kb_search": {
    "type": "corpora_search",
    "query_configuration": {
      "search": {
        "corpora": [
          { "corpus_key": "company-kb", "lexical_interpolation": 0.05 },
          { "corpus_key": "support-tickets", "lexical_interpolation": 0.1 }
        ],
        "metadata_filter": "doc.product = '$session.metadata.product$'",
        "reranker": {
          "type": "chain",
          "rerankers": [
            { "type": "customer_reranker",
              "reranker_name": "Rerank_Multilingual_v1",
              "limit": 50 },
            { "type": "mmr", "diversity_bias": 0.3, "limit": 10 }
          ]
        },
        "limit": 10
      }
    }
  }
}

This single tool gives the agent:

Hybrid search blended at 5% lexical for docs, 10% for tickets.
Filtering by the session's current product (multi-tenant).
Multilingual reranking pass to 50 chunks.
MMR re-rank to 10 diverse final chunks.
Citations on every result.

Drop the filter to do company-wide search. Swap the corpus list to add sources. Same shape every time.

For tuning, see Tune retrieval and Tune retrieval for agents.

Multilingual by default

Boomerang is a single multilingual encoder, not a fork-per-language deployment. The same corpus serves English, Japanese, German, and low-resource languages with the same configuration. Cross-lingual queries work out of the box: a Japanese query can match an English document if the meaning aligns.

On XQuAD-R, the cross-lingual retrieval benchmark, Boomerang + Slingshot scores 76.2% on the cross-lingual average, ahead of jina-m0, mxbai, and Qwen3, with the lead widening on low-resource language pairs (for example, +21.6 points over the next platform on the English↔Telugu pair).

For applications that span markets, this is one configuration choice instead of a deployment fork per locale.

Filtering at scale

Metadata filters narrow the candidate set before dense scoring. That's how Vectara survives corpora of billions of tokens.

"metadata_filter": "doc.region = 'EU' AND doc.indexed_at > 1700000000 AND doc.classification = 'public'"

Filter attributes are declared on the corpus. Choose them deliberately: too few and you over-fetch, too many and you fragment storage.

Common filter	Why
`doc_type`	Separate manuals from KB from forum posts
`product`, `version`	Multi-product, multi-version retrieval
`region`, `locale`	Region-specific compliance and language
`classification`	Public vs internal vs confidential
`indexed_at`	Recency boost or freshness filter
`customer_id` / `tenant_id`	Hard tenant boundary

See Metadata filters.

Access control at retrieval

Access control works at two levels: corpus access and result-level filtering. The distinction matters because indexed data is retrieved from Vectara, not from the original source system.

Corpus-level RBAC is enforced by Vectara. A query is rejected unless the requesting identity has permission to query every corpus targeted by the request. This controls which corpora an identity can access.

Finer-grained access inside a corpus is enforced through filters you provide. Vectara does not natively inherit per-document ACLs from the source system. After data is indexed, any identity with query permission on the corpus can retrieve matching documents unless the request narrows the result set. To preserve user-specific access boundaries, attach access metadata to each document at index time, then pass a metadata_filter on every query. Build this filter server-side from the user’s verified attributes, not from client-supplied values. If you omit the filter, the full corpus is in scope for retrieval.

Vectara applies the supplied filter before dense scoring, reranking, and generation. Passages that do not match the filter are excluded before they can reach the reranker or the LLM. The platform enforces the filter you provide; the calling application is responsible for constructing the correct filter and keeping indexed access metadata synchronized with the source system.

See role-based access control and attribute-based access control.

Why use Vectara instead of building RAG yourself

Vectara gives you production retrieval, grounding, evaluation, and governance primitives without requiring your team to assemble and maintain each layer independently.

What Vectara provides	Why it matters
Sentence and max-chars chunking	Splits content at sentence boundaries or up to a character limit, and never crosses section boundaries, keeping each part within a single section's context.
Multilingual retrieval	Supports retrieval across languages without requiring you to maintain separate embedding infrastructure.
Hybrid lexical and dense search	Combines keyword precision with semantic recall, covering cases either method can miss on its own.
Reranking controls	Lets you tune retrieval quality with reranking strategies such as knee, MMR, and UDF-based controls.
Citations	Gives users and reviewers source-level evidence for generated answers.
Corpus-level RBAC and pre-retrieval metadata filters	Enforces corpus access at the platform layer and applies supplied ABAC filters before scoring, reranking, and generation.
Production ingestion pipelines	Supports operational ingestion patterns such as watermarks, judges, and dead-letter queues.
HHEM answer grading	Provides a signal for evaluating and tuning answer quality.

This lets your application focus on the user experience, workflow design, identity handoff, and product surface while Vectara handles the retrieval, grounding, orchestration, evaluation, and governance layers underneath.

Context and memory — the same primitive serving memory and scratchpad.
Tune retrieval for agents — per-tool retrieval configuration.
Hybrid search — the lexical + dense blend.
Reranking — chain rerankers including MMR and recency UDFs.
Pipelines quickstart — scheduled ingestion from S3, SharePoint, web, Salesforce, Slack, Notion, Google Drive, GitHub, and more.
Hallucination evaluation — HHEM grading on every answer.

What "knowledge" means​

The retrieval pipeline​

Loading knowledge​

Pipelines​

Direct upload​

API indexing​

Querying knowledge from an agent​

Multilingual by default​

Filtering at scale​

Access control at retrieval​

Why use Vectara instead of building RAG yourself​

Related​