Fuzzy Metadata Search
Metadata is rarely uniform across different document sources. Titles, categories, and headings can vary and change over time. When users only know part of a value, strict equality filters miss relevant items.
The tech preview of Fuzzy Metadata Search combines exact pre‑filtering with fuzzy, weighted matching across specific metadata fields. First, narrow the candidate set precisely, such as by status, region, or date. Then, rank what remains using field‑aware fuzzy matching so users find what they mean, and not just what they type.
- Supports document-level and part-level metadata searches.
- Returns relevance‑scored results with pagination (
limit,offset,total_count). - Lets you weight fields (
title^2.0,category^1.0) to tune ranking. - Works alongside existing metadata filters for access control and faceted narrowing.
Use document level metadata when you want unique documents. Use part level
metadata when you need to surface matching sections within documents.
Because the fuzzy metadata search feature is a tech preview, it can potentially have breaking changes.
How fuzzy search works
- Applies fuzzy matching automatically to all field queries
- Handles common typos, character transpositions, and missing characters
- Field weights influence the final relevance score
- Applies exact
metadata_filterto narrow results - Performs fuzzy matching on remaining documents
Field weighting strategy
Adjust field weights to control search relevance:
- Higher weights (2.0-3.0): Critical fields like title or primary identifier
- Medium weights (1.0-1.5): Important supporting fields
- Lower weights (0.5-1.0): Additional context fields
Example Weighting Strategy
1
Example request (document level)
1
Example response
1
Filter syntax
metadata_filter uses Vectara’s metadata filter expression syntax. Prefix every field with its scope: doc. (document-level) or part. (part-level).
Supported operators
- Arithmetic:
+ - * / % - Comparisons:
< <= > >= = == != <> - Null tests:
IS NULL,IS NOT NULL - Membership:
IN (...) - Logical:
NOT,AND,OR
Examples
doc.status = 'Active'doc.pageCount > 10doc.publish_date >= '2025-08-01'doc.category IN ('contract', 'policy')doc.status = 'Active' AND part.clause_type = 'Liability'
The filter language does not support SQL LIKE. Use fuzzy queries to handle approximate text.
Weighted multi‑field search
1
Exact filtering plus fuzzy ranking
1
Part‑level search
1