Tune retrieval for agents
This guide walks through how to get high-quality retrieval out of an agent
that calls the corpora_search tool. It is opinionated and ordered by
impact: each section builds on the previous one. If your agent is returning
weak or irrelevant results, work through the sections in order — the earliest
fixes pay back the most.
The agent chooses the query string, and on the corpora_search tool
also the corpus (from a list you allow). Filter attribute statistics
are reachable through the separate corpus_filter_attribute_stats tool.
Today everything else — context configuration, reranker chain, which
filter applies to which corpus, lexical weight, and the metadata filter
itself — is fixed by you when you configure the tool. (We may expose
more of these to the agent in the future.) So "tuning retrieval for an
agent" is mostly about three things, in priority order:
- Shaping the corpus during ingest.
- Telling the agent — through tool descriptions and instructions — what the corpus contains and when to use it.
- Configuring the tool so the small set of choices the agent does make are all good ones.
1. Ingest is the heart of retrieval quality
Nothing the query side can do will recover information that was lost or mangled during ingest. The file upload endpoint is a fine starting point — it handles sentence-level chunking and common file formats — but heavier use cases benefit from an agentic ingest pipeline where an agent walks each document, enriches metadata, and decides how to chunk.
For retrieval quality specifically, four things matter most:
-
Parts carry meaning, not just text. A document is broken into parts; each part is what gets embedded, reranked, and shown to the agent. Sentence-level chunks are a fine default, but very short or fragmentary parts (FAQ items, captions) need extra surrounding text at query time — see section 4. For pre-structured source documents, use semantic indexing so parts follow the document's natural sections.
-
Tables and images need real descriptions. Tables become their own parts when
extract_tablesis set; the description is what gets embedded, so tuneTableGenerationSpecuntil the description uses the columns and entities a user would search for. Images are retrievable through their summary — generate that summary with a domain-specific prompt, and feel free to attach multiple parts to the same image (overall summary + per-region summaries) so multiple query angles hit. See Working with tables. -
Declare filter attributes early. Document and part metadata travels with results, but only fields declared as
filter_attributeson the corpus can appear in ametadata_filter. Adding them later requires reindexing. Common picks:Attribute Level Why it pays off doc.sourcedocument Scope by repo, drive, tenant. doc.created_atdocument Drives time-decay UDF reranking. doc.langdocument Avoids cross-language noise. part.sectionpart Lets agents prefer "summary" or "intro". doc.acldocument Combine with auth — see access control. -
Agent-generated metadata is fair game. Use an agentic pipeline to add summaries, topic tags, or "who/what/when" annotations the source didn't carry. These show up in results and as filter attributes if you declare them.
See Data ingestion and metadata filters overview for the full mechanics.
2. Tell the agent what the corpus contains
Context engineering covers the general case for writing tool names, descriptions, and schemas. This section is about the retrieval-specific content those descriptions need to carry — the things the agent has to know to choose a good query and a good corpus.
For a corpora_search tool, the description should answer:
- What's in this corpus at a level a model can use to decide. "Customer support articles for SaaS product X, written 2022–present, including troubleshooting, billing, and security topics" beats "support docs".
- What's not in this corpus. "Does not contain real-time billing
data — use the
billing_lookuptool for that." Negative scoping is often more useful than positive. - The corpus's own vocabulary. Embeddings reward phrasing that matches the corpus. Tell the agent: "Use 'flow', 'recipe', 'workspace' rather than 'pipeline', 'workflow', 'tenant'." A glossary is a good place for this when the list is long.
- Which filter attributes exist and when consulting
corpus_filter_attribute_statsfirst is worth the extra tool call.
When the tool is configured with a corpus_filters map (see section 3
Option B), the agent sees the corpus keys directly — the description
needs to make the trade-off between corpora obvious from those keys
alone.
Worked example
Below is a complete instruction block for an agent that has both a
search_support_kb tool and a corpus_filter_attribute_stats tool.
The pattern: tool descriptions give the agent enough to decide
whether to call; agent instructions tell it how to call them
together.
TOOL DESCRIPTIONS AND AGENT INSTRUCTIONS
Code example with text syntax.1
3. Filter attribute selection for agents
Filters are the highest-precision lever you have, and the most under-used. There are four ways an agent ends up with a filter applied. They compose, but for any given tool configuration pick the simplest one that works.
Option A: Eager reference from session metadata
The corpora_search tool's query_configuration accepts an
EagerReference for both corpus_key and metadata_filter. At the
start of each turn, the platform resolves the reference from the
session context. The agent sees nothing — the filter just applies.
Use this whenever the filter is determined by who is asking rather
than what they're asking: tenant scoping, ACLs, locale,
subscription tier. Wire session.metadata.tenant_filter once at
session creation and every search the agent makes is scoped.
EAGER REFERENCE FILTER FROM SESSION METADATA
Code example with json syntax.1
Option B: Let the agent pick the corpus, with per-corpus filters
The corpora_search tool lets the agent choose the corpus itself.
In the tool's argument_override you set a corpus_filters map: keys
are the allowed corpora, values are the metadata filter to apply when
the agent picks that key. The agent is told only about the keys; the
right filter is attached automatically. Filter values can themselves
be $ref references resolved from the session.
This is the right shape for agents that span multiple knowledge domains — public docs vs. internal runbooks vs. customer tickets — and each domain needs its own scoping.
AGENT PICKS CORPUS; FILTER IS ATTACHED AUTOMATICALLY
Code example with json syntax.1
Option C: Let the agent look up filter values before searching
Pair the search tool with the corpus_filter_attribute_stats tool.
The agent can call it to enumerate the distinct values of a filter
attribute on a chosen corpus, then construct a precise filter for the
follow-up search. Scope what it can inspect with allowed_corpus_keys
or by reusing the same corpus_filters map from Option B.
This is the highest-precision option for filters that depend on what the user asked ("invoices from Q3" → look up the actual quarter values present, then filter). It costs an extra tool call per ambiguous query, so reserve it for attributes the agent can't reasonably guess. Make sure the agent's instructions tell it when this lookup is worth doing.
Option D: Argument override with a static filter
If the filter is the same for every call (e.g. a tool dedicated to a
single product), set metadata_filter in argument_override and
forget about it.
4. Match context_configuration to your chunking
Once parts are clean and filters are right, decide how much surrounding text to send to the agent with each result. The agent reads the result text, so this choice directly shapes what it can reason about.
The relevant fields on the search call are:
sentences_before/sentences_after(preferred)characters_before/characters_after(fallback)start_tag/end_tagto mark the matching part inside the context
A practical default for agent retrieval is 2 sentences before and 2
after with <em> markers around the match. This gives the agent
enough surrounding text to disambiguate what the chunk is about
without bloating the context window.
Tune from there:
- If your parts are already long (whole sections), drop the surrounding context to 0 — duplicating it is just tokens.
- If your parts are tight (one sentence, FAQ entries), bump
sentences_before/afterto 3–5 so the agent sees the surrounding question or list. - For code or structured content, prefer
characters_before/afterso you don't break on missing sentence terminators.
SENSIBLE DEFAULT CONTEXT CONFIGURATION
Code example with json syntax.1
5. Build a reranker chain
A single reranker is rarely the right answer. The strongest setups use a chain:
- Neural reranker with a calibrated cutoff — does the heavy lifting on relevance.
- MMR — reduces redundancy so the agent doesn't see five paraphrases of the same paragraph.
- Optional UDF — applies business signal (recency, popularity, custom dimensions).
Step 1: Neural reranker with a cutoff
Start with the multilingual neural reranker (customer_reranker /
Rerank_Multilingual_v1). Its scores are normalized to roughly
0.0–1.0, which makes a cutoff meaningful — that's not true of raw
hybrid scores.
Start at cutoff: 0.5 — that's the value Vectara recommends as a
default for the multilingual reranker. Tune from there:
- Below
0.3: you're letting noise through. - Above
0.7: you'll often return zero results on real queries.
Combine cutoff with a generous limit — let the cutoff drop junk,
and let the limit cap the absolute count.
If the reranker supports instructions (instruction-following
rerankers do), use the instructions field to bias toward the
agent's task — e.g. "prefer policy documents over marketing
material".
Step 2: MMR for the agent's reading list
Neural rerankers reward relevance, not diversity. Without MMR you
routinely send the agent five near-duplicates of the highest-scoring
paragraph. Apply MMR after the neural reranker with a modest
diversity_bias (start at 0.3) and a limit matching what you
actually want to send to generation (typically 5–15).
Step 3: UDF for time and business signal
A userfn reranker can apply arbitrary scoring on top of metadata.
The two cases that come up constantly:
- Recency: down-weight stale documents using
doc.created_at. - Authority: boost documents with high
part.upvotes,doc.is_official, or a custom dimension.
Run this last in the chain so the neural reranker isn't fighting your business rules.
NEURAL + MMR + RECENCY UDF RERANKER CHAIN
Code example with json syntax.1
The UDF reads as: if the document is more than 8760 hours (≈1 year)
old, halve its score; otherwise leave it alone. The
UDF reranker reference
covers the full expression language — now(), iso_datetime_parse,
duration arithmetic, and the get('$.field') accessor.
See Reranking overview, Limits and cutoffs, and Chain reranker.
6. Lexical signal (lexical_interpolation)
lexical_interpolation (often called lambda) blends keyword scoring
into the embedding score. 0.0 is pure semantic, 1.0 is pure
lexical. The agent default is 0.025 — a light keyword sprinkle.
Tune by query type, not by gut feel:
| Query style | Suggested λ |
|---|---|
| Conceptual questions ("how does X work") | 0.0–0.025 |
| Mixed natural-language with key terms | 0.025–0.1 |
| Identifier or codename heavy ("error E_42") | 0.1–0.3 |
| Exact-string lookup | 0.3–0.6 |
If you find yourself wanting λ > 0.5, the underlying problem is
often that the term you care about isn't being parsed as one token
by the embedder — fix it at ingest (e.g. ensure E_42 survives
tokenization) before reaching for more lexical weight.
7. After the search: follow-up tools for big results
A corpora_search result that fits in the agent's context is the
common case, but two situations push results past the
tool-output offloading threshold:
- Tables. Vectara doesn't split tables across parts — a single table part can be many KB on its own. Hit one and the search response routinely trips offloading.
- Generous
context_configurationcombined with a high per-corpuslimit. Five long sections × 4 sentences of surrounding context adds up faster than you expect.
When that happens, the platform writes the full response to a
session artifact and hands the agent a small reference instead. If
the agent has no artifact tools configured, the platform won't
offload at all — and the result blows up the prompt. Configure the
artifact toolset alongside corpora_search:
artifact_read— read full content or a line range.artifact_grep— regex search inside the artifact.artifact_jq— JSON queries against structured outputs (search results are JSON, soartifact_jqis the right primary follow-up).
Configuring any one of these flips offloading on by default; configure all three so the agent can pick the right access pattern. See Tool-output offloading for thresholds and the reference shape the agent receives.
get_document_text for full-document follow-ups
Search returns parts, not whole documents. When the agent finds a hit
but needs more of the surrounding document than context_configuration
gave it, get_document_text fetches the full document text from
the corpus and stores it as an artifact. The agent then explores it
with the same artifact tools.
This is the right pattern for:
- "Quote the entire policy section, not just the matching paragraph."
- "Find every mention of a term across the document the hit came from."
- Long-form documents where the relevant answer spans more than the reranked part.
What to put in agent instructions
The agent has to know to follow up. Add to the agent's instructions something like:
When 'corpora_search' returns an artifact reference instead of inline
results (because the response was large), use 'artifact_jq' to extract
the fields you need from the search results — typically:
'.search_results[] | {text, document_id, score}'
If a search hit references a document and you need more text around
the hit than the snippet contains, call 'get_document_text' on its
document_id and grep or read the resulting artifact.
8. How to iterate
A tuning loop that works in practice:
- Build a small eval set. 20–50 real queries with the answer document(s) labeled. Most teams skip this and regret it.
- Run with reranking off (
type: "none") and look at the pre-rerank results. If the right document isn't in the top 50, no reranker will save you — go back to ingest, filters, orlexical_interpolation. - Turn the neural reranker on and check that the right document
is now in the top 5. If not, tune
instructionsor yourcontext_configuration(the reranker scores text including surrounding context wheninclude_context: true). - Add MMR and verify you're not just shipping duplicates.
- Add the UDF and check that recency/authority hasn't pushed a genuinely best answer off the list.
- Inspect query history — every query keeps
spanswith pre-rerank, post-rerank, and rewritten-query data. This is the single most useful tuning signal you have.
Putting it together: a strong default agent toolset
A corpora_search tool on its own isn't the full retrieval setup —
pair it with the artifact tools (so big results can offload safely)
and get_document_text (so the agent can pull a full document when a
hit isn't enough).
RECOMMENDED BASELINE RETRIEVAL TOOLSET
Code example with json syntax.1
Treat this as a starting point, not a destination. Run your eval set against it, then walk back through sections 1–7 and adjust the one parameter that moves the metric.