Metadata Query API Definition
When metadata is inconsistent across sources, using only exact filters force users
to guess the right value. Look at the example of “MSA”. Is it “Master
Services Agreement” or “Master Service Agreement?”. What if a user spelled
"employement" or "agrrement" incorrectly in a search?
The Metadata Query API enables intelligent searching across document metadata fields using fuzzy matching algorithms. This powerful feature allows users to find documents even when search terms contain typos, variations, or approximate matches. This is ideal for searching structured metadata like titles, categories, authors, or custom fields where exact matching may be too restrictive.
This API supports both document-level and part-level metadata. Use it to power legal ops, support, and compliance workflows where titles, categories, or headings often vary and users expect relevant results despite typos or naming drift.
Use cases
- Contract discovery: Find agreements despite variations in naming
- Author search: Match documents by author with name variations
- Category navigation: Discover documents in related categories
- Multi-field search: Combine multiple metadata fields for comprehensive matching
- Data quality: Find documents with inconsistent metadata
Metadata Query request and response details
To perform a fuzzy metadata search, send a POST
request to
/v2/corpora/:corpus_key/metadata_query
, where corpus_key
specifies the
unique identifier for the corpus. Fuzzy search requires the following parameters:
level:
Specifydocument
to return unique documents orpart
for multiple parts from the same document.
Default: documentqueries
: One or more field-specific fuzzy queries.field
: Metadata field without thedoc.
orpart.
prefix; must be a TEXT filter attribute.query
: Text to match approximately.weight
: Increases the influence of a field on ranking.
Default: 1.0
metadata_filter
: Exact filter applied before fuzzy matching. This uses Vectara filter syntax withdoc.
/part.
scoping.Limit
: The maximum number of results.
Default: 10
Range: 1-100offset
: The start value of the pagination. Default: 0
The response contains a documents object with results ordered by relevance score with full metadata.
doc_id
: Document ID.score
: Relevance score.metadata
: Returned metadata for the match.total_count
: Total matches across all pages.
Example request: Basic fuzzy search
1
Example request: Fuzzy search with pre-filtering
1
Example Response
The response contains a documents object with results ordered by relevance score with full metadata.
doc_id
: Document ID.score
: Relevance score.metadata
: Returned metadata array for the match.total_count
: Total matches across all pages.
1
The API returns standard HTTP error codes with detailed error information:
HTTP Code | Error Code | Description |
---|---|---|
400 | invalid_request | Malformed query or invalid field names |
400 | invalid_metadata_filter | Filter expression syntax error |
401 | unauthorized | Invalid or missing API key |
403 | forbidden | Insufficient permissions for corpus |
404 | corpus_not_found | Specified corpus does not exist |
429 | rate_limit_exceeded | Request rate limit exceeded |
How Fuzzy Matching Works
- Automatic Application: Fuzzy matching is applied automatically to all field queries
- Typo Tolerance: Handles common typos, character transpositions, and missing characters
- Weighted Scoring: Field weights influence the final relevance score
- Two-Stage Process:
- First: Apply exact metadata_filter to narrow results
- Second: Perform fuzzy matching on remaining documents
Field weighting strategy
Adjust field weights to control search relevance:
- Higher weights (2.0-3.0): Critical fields like title or primary identifier
- Medium weights (1.0-1.5): Important supporting fields
- Lower weights (0.5-1.0): Additional context fields
Example Weighting Strategy
1
Error Responses
The API returns standard HTTP error codes with detailed error information:
HTTP Code | Error Code | Description |
---|---|---|
400 | invalid_request | Malformed query or invalid field names |
400 | invalid_metadata_filter | Filter expression syntax error |
401 | unauthorized | Invalid or missing API key |
403 | forbidden | Insufficient permissions for corpus |
404 | corpus_not_found | Specified corpus does not exist |
429 | rate_limit_exceeded | Request rate limit exceeded |
Limitations
- Currently limited to metadata fields (not full-text content)
- Maximum 100 results per request
- Fuzzy matching parameters are not user-configurable
- Internal API status may limit availability