Skip to main content
Version: 2.0

Tune retrieval for agents

This guide walks through how to get high-quality retrieval out of an agent that calls the corpora_search tool. It is opinionated and ordered by impact: each section builds on the previous one. If your agent is returning weak or irrelevant results, work through the sections in order — the earliest fixes pay back the most.

The agent chooses the query string, and on the corpora_search tool also the corpus (from a list you allow). Filter attribute statistics are reachable through the separate corpus_filter_attribute_stats tool. Today everything else — context configuration, reranker chain, which filter applies to which corpus, lexical weight, and the metadata filter itself — is fixed by you when you configure the tool. (We may expose more of these to the agent in the future.) So "tuning retrieval for an agent" is mostly about three things, in priority order:

  1. Shaping the corpus during ingest.
  2. Telling the agent — through tool descriptions and instructions — what the corpus contains and when to use it.
  3. Configuring the tool so the small set of choices the agent does make are all good ones.

1. Ingest is the heart of retrieval quality

Nothing the query side can do will recover information that was lost or mangled during ingest. The file upload endpoint is a fine starting point — it handles sentence-level chunking and common file formats — but heavier use cases benefit from an agentic ingest pipeline where an agent walks each document, enriches metadata, and decides how to chunk.

For retrieval quality specifically, four things matter most:

  • Parts carry meaning, not just text. A document is broken into parts; each part is what gets embedded, reranked, and shown to the agent. Sentence-level chunks are a fine default, but very short or fragmentary parts (FAQ items, captions) need extra surrounding text at query time — see section 4. For pre-structured source documents, use semantic indexing so parts follow the document's natural sections.

  • Tables and images need real descriptions. Tables become their own parts when extract_tables is set; the description is what gets embedded, so tune TableGenerationSpec until the description uses the columns and entities a user would search for. Images are retrievable through their summary — generate that summary with a domain-specific prompt, and feel free to attach multiple parts to the same image (overall summary + per-region summaries) so multiple query angles hit. See Working with tables.

  • Declare filter attributes early. Document and part metadata travels with results, but only fields declared as filter_attributes on the corpus can appear in a metadata_filter. Adding them later requires reindexing. Common picks:

    AttributeLevelWhy it pays off
    doc.sourcedocumentScope by repo, drive, tenant.
    doc.created_atdocumentDrives time-decay UDF reranking.
    doc.langdocumentAvoids cross-language noise.
    part.sectionpartLets agents prefer "summary" or "intro".
    doc.acldocumentCombine with auth — see access control.
  • Agent-generated metadata is fair game. Use an agentic pipeline to add summaries, topic tags, or "who/what/when" annotations the source didn't carry. These show up in results and as filter attributes if you declare them.

See Data ingestion and metadata filters overview for the full mechanics.

2. Tell the agent what the corpus contains

Context engineering covers the general case for writing tool names, descriptions, and schemas. This section is about the retrieval-specific content those descriptions need to carry — the things the agent has to know to choose a good query and a good corpus.

For a corpora_search tool, the description should answer:

  • What's in this corpus at a level a model can use to decide. "Customer support articles for SaaS product X, written 2022–present, including troubleshooting, billing, and security topics" beats "support docs".
  • What's not in this corpus. "Does not contain real-time billing data — use the billing_lookup tool for that." Negative scoping is often more useful than positive.
  • The corpus's own vocabulary. Embeddings reward phrasing that matches the corpus. Tell the agent: "Use 'flow', 'recipe', 'workspace' rather than 'pipeline', 'workflow', 'tenant'." A glossary is a good place for this when the list is long.
  • Which filter attributes exist and when consulting corpus_filter_attribute_stats first is worth the extra tool call.

When the tool is configured with a corpus_filters map (see section 3 Option B), the agent sees the corpus keys directly — the description needs to make the trade-off between corpora obvious from those keys alone.

Worked example

Below is a complete instruction block for an agent that has both a search_support_kb tool and a corpus_filter_attribute_stats tool. The pattern: tool descriptions give the agent enough to decide whether to call; agent instructions tell it how to call them together.

TOOL DESCRIPTIONS AND AGENT INSTRUCTIONS

Code example with text syntax.
1

3. Filter attribute selection for agents

Filters are the highest-precision lever you have, and the most under-used. There are four ways an agent ends up with a filter applied. They compose, but for any given tool configuration pick the simplest one that works.

Option A: Eager reference from session metadata

The corpora_search tool's query_configuration accepts an EagerReference for both corpus_key and metadata_filter. At the start of each turn, the platform resolves the reference from the session context. The agent sees nothing — the filter just applies.

Use this whenever the filter is determined by who is asking rather than what they're asking: tenant scoping, ACLs, locale, subscription tier. Wire session.metadata.tenant_filter once at session creation and every search the agent makes is scoped.

EAGER REFERENCE FILTER FROM SESSION METADATA

Code example with json syntax.
1

Option B: Let the agent pick the corpus, with per-corpus filters

The corpora_search tool lets the agent choose the corpus itself. In the tool's argument_override you set a corpus_filters map: keys are the allowed corpora, values are the metadata filter to apply when the agent picks that key. The agent is told only about the keys; the right filter is attached automatically. Filter values can themselves be $ref references resolved from the session.

This is the right shape for agents that span multiple knowledge domains — public docs vs. internal runbooks vs. customer tickets — and each domain needs its own scoping.

AGENT PICKS CORPUS; FILTER IS ATTACHED AUTOMATICALLY

Code example with json syntax.
1

Option C: Let the agent look up filter values before searching

Pair the search tool with the corpus_filter_attribute_stats tool. The agent can call it to enumerate the distinct values of a filter attribute on a chosen corpus, then construct a precise filter for the follow-up search. Scope what it can inspect with allowed_corpus_keys or by reusing the same corpus_filters map from Option B.

This is the highest-precision option for filters that depend on what the user asked ("invoices from Q3" → look up the actual quarter values present, then filter). It costs an extra tool call per ambiguous query, so reserve it for attributes the agent can't reasonably guess. Make sure the agent's instructions tell it when this lookup is worth doing.

Option D: Argument override with a static filter

If the filter is the same for every call (e.g. a tool dedicated to a single product), set metadata_filter in argument_override and forget about it.

4. Match context_configuration to your chunking

Once parts are clean and filters are right, decide how much surrounding text to send to the agent with each result. The agent reads the result text, so this choice directly shapes what it can reason about.

The relevant fields on the search call are:

  • sentences_before / sentences_after (preferred)
  • characters_before / characters_after (fallback)
  • start_tag / end_tag to mark the matching part inside the context

A practical default for agent retrieval is 2 sentences before and 2 after with <em> markers around the match. This gives the agent enough surrounding text to disambiguate what the chunk is about without bloating the context window.

Tune from there:

  • If your parts are already long (whole sections), drop the surrounding context to 0 — duplicating it is just tokens.
  • If your parts are tight (one sentence, FAQ entries), bump sentences_before/after to 3–5 so the agent sees the surrounding question or list.
  • For code or structured content, prefer characters_before/after so you don't break on missing sentence terminators.

SENSIBLE DEFAULT CONTEXT CONFIGURATION

Code example with json syntax.
1

5. Build a reranker chain

A single reranker is rarely the right answer. The strongest setups use a chain:

  1. Neural reranker with a calibrated cutoff — does the heavy lifting on relevance.
  2. MMR — reduces redundancy so the agent doesn't see five paraphrases of the same paragraph.
  3. Optional UDF — applies business signal (recency, popularity, custom dimensions).

Step 1: Neural reranker with a cutoff

Start with the multilingual neural reranker (customer_reranker / Rerank_Multilingual_v1). Its scores are normalized to roughly 0.0–1.0, which makes a cutoff meaningful — that's not true of raw hybrid scores.

Start at cutoff: 0.5 — that's the value Vectara recommends as a default for the multilingual reranker. Tune from there:

  • Below 0.3: you're letting noise through.
  • Above 0.7: you'll often return zero results on real queries.

Combine cutoff with a generous limit — let the cutoff drop junk, and let the limit cap the absolute count.

If the reranker supports instructions (instruction-following rerankers do), use the instructions field to bias toward the agent's task — e.g. "prefer policy documents over marketing material".

Step 2: MMR for the agent's reading list

Neural rerankers reward relevance, not diversity. Without MMR you routinely send the agent five near-duplicates of the highest-scoring paragraph. Apply MMR after the neural reranker with a modest diversity_bias (start at 0.3) and a limit matching what you actually want to send to generation (typically 5–15).

Step 3: UDF for time and business signal

A userfn reranker can apply arbitrary scoring on top of metadata. The two cases that come up constantly:

  • Recency: down-weight stale documents using doc.created_at.
  • Authority: boost documents with high part.upvotes, doc.is_official, or a custom dimension.

Run this last in the chain so the neural reranker isn't fighting your business rules.

NEURAL + MMR + RECENCY UDF RERANKER CHAIN

Code example with json syntax.
1

The UDF reads as: if the document is more than 8760 hours (≈1 year) old, halve its score; otherwise leave it alone. The UDF reranker reference covers the full expression language — now(), iso_datetime_parse, duration arithmetic, and the get('$.field') accessor.

See Reranking overview, Limits and cutoffs, and Chain reranker.

6. Lexical signal (lexical_interpolation)

lexical_interpolation (often called lambda) blends keyword scoring into the embedding score. 0.0 is pure semantic, 1.0 is pure lexical. The agent default is 0.025 — a light keyword sprinkle.

Tune by query type, not by gut feel:

Query styleSuggested λ
Conceptual questions ("how does X work")0.0–0.025
Mixed natural-language with key terms0.025–0.1
Identifier or codename heavy ("error E_42")0.1–0.3
Exact-string lookup0.3–0.6

If you find yourself wanting λ > 0.5, the underlying problem is often that the term you care about isn't being parsed as one token by the embedder — fix it at ingest (e.g. ensure E_42 survives tokenization) before reaching for more lexical weight.

7. After the search: follow-up tools for big results

A corpora_search result that fits in the agent's context is the common case, but two situations push results past the tool-output offloading threshold:

  • Tables. Vectara doesn't split tables across parts — a single table part can be many KB on its own. Hit one and the search response routinely trips offloading.
  • Generous context_configuration combined with a high per-corpus limit. Five long sections × 4 sentences of surrounding context adds up faster than you expect.

When that happens, the platform writes the full response to a session artifact and hands the agent a small reference instead. If the agent has no artifact tools configured, the platform won't offload at all — and the result blows up the prompt. Configure the artifact toolset alongside corpora_search:

  • artifact_read — read full content or a line range.
  • artifact_grep — regex search inside the artifact.
  • artifact_jq — JSON queries against structured outputs (search results are JSON, so artifact_jq is the right primary follow-up).

Configuring any one of these flips offloading on by default; configure all three so the agent can pick the right access pattern. See Tool-output offloading for thresholds and the reference shape the agent receives.

get_document_text for full-document follow-ups

Search returns parts, not whole documents. When the agent finds a hit but needs more of the surrounding document than context_configuration gave it, get_document_text fetches the full document text from the corpus and stores it as an artifact. The agent then explores it with the same artifact tools.

This is the right pattern for:

  • "Quote the entire policy section, not just the matching paragraph."
  • "Find every mention of a term across the document the hit came from."
  • Long-form documents where the relevant answer spans more than the reranked part.

What to put in agent instructions

The agent has to know to follow up. Add to the agent's instructions something like:

When 'corpora_search' returns an artifact reference instead of inline
results (because the response was large), use 'artifact_jq' to extract
the fields you need from the search results — typically:

'.search_results[] | {text, document_id, score}'

If a search hit references a document and you need more text around
the hit than the snippet contains, call 'get_document_text' on its
document_id and grep or read the resulting artifact.

8. How to iterate

A tuning loop that works in practice:

  1. Build a small eval set. 20–50 real queries with the answer document(s) labeled. Most teams skip this and regret it.
  2. Run with reranking off (type: "none") and look at the pre-rerank results. If the right document isn't in the top 50, no reranker will save you — go back to ingest, filters, or lexical_interpolation.
  3. Turn the neural reranker on and check that the right document is now in the top 5. If not, tune instructions or your context_configuration (the reranker scores text including surrounding context when include_context: true).
  4. Add MMR and verify you're not just shipping duplicates.
  5. Add the UDF and check that recency/authority hasn't pushed a genuinely best answer off the list.
  6. Inspect query history — every query keeps spans with pre-rerank, post-rerank, and rewritten-query data. This is the single most useful tuning signal you have.

Putting it together: a strong default agent toolset

A corpora_search tool on its own isn't the full retrieval setup — pair it with the artifact tools (so big results can offload safely) and get_document_text (so the agent can pull a full document when a hit isn't enough).

RECOMMENDED BASELINE RETRIEVAL TOOLSET

Code example with json syntax.
1

Treat this as a starting point, not a destination. Run your eval set against it, then walk back through sections 1–7 and adjust the one parameter that moves the metric.