Fuzzy Metadata Search
Metadata is rarely uniform across different document sources. Titles, categories, and headings can vary and change over time. When users only know part of a value, strict equality filters miss relevant items.
The tech preview of Fuzzy Metadata Search combines exact pre‑filtering with fuzzy, weighted matching across specific metadata fields. First, narrow the candidate set precisely, such as by status, region, or date. Then, rank what remains using field‑aware fuzzy matching so users find what they mean, and not just what they type.
- Supports document-level and part-level metadata searches.
- Returns relevance‑scored results with pagination (
limit
,offset
,total_count
). - Lets you weight fields (
title^2.0
,category^1.0
) to tune ranking. - Works alongside existing metadata filters for access control and faceted narrowing.
Use document
level metadata when you want unique documents. Use part
level
metadata when you need to surface matching sections within documents.
Because the fuzzy metadata search feature is a tech preview, it can potentially have breaking changes.
How fuzzy search works
- Applies fuzzy matching automatically to all field queries
- Handles common typos, character transpositions, and missing characters
- Field weights influence the final relevance score
- Applies exact
metadata_filter
to narrow results - Performs fuzzy matching on remaining documents
Field weighting strategy
Adjust field weights to control search relevance:
- Higher weights (2.0-3.0): Critical fields like title or primary identifier
- Medium weights (1.0-1.5): Important supporting fields
- Lower weights (0.5-1.0): Additional context fields
Example Weighting Strategy
1
Example request (document level)
1
Example response
1
Filter syntax
metadata_filter
uses Vectara’s metadata filter expression syntax. Prefix every field with its scope: doc.
(document-level) or part.
(part-level).
Supported operators
- Arithmetic:
+ - * / %
- Comparisons:
< <= > >= = == != <>
- Null tests:
IS NULL
,IS NOT NULL
- Membership:
IN (...)
- Logical:
NOT
,AND
,OR
Examples
doc.status = 'Active'
doc.pageCount > 10
doc.publish_date >= '2025-08-01'
doc.category IN ('contract', 'policy')
doc.status = 'Active' AND part.clause_type = 'Liability'
The filter language does not support SQL LIKE
. Use fuzzy queries
to handle approximate text.
Weighted multi‑field search
1
Exact filtering plus fuzzy ranking
1
Part‑level search
1