Skip to main content
Version: 2.0

Metadata Query API Definition

When metadata is inconsistent across sources, using only exact filters force users to guess the right value. Look at the example of “MSA”. Is it “Master Services Agreement” or “Master Service Agreement?”. What if a user spelled
"employement" or "agrrement" incorrectly in a search?

The Metadata Query API enables intelligent searching across document metadata fields using fuzzy matching algorithms. This powerful feature allows users to find documents even when search terms contain typos, variations, or approximate matches. This is ideal for searching structured metadata like titles, categories, authors, or custom fields where exact matching may be too restrictive.

This API supports both document-level and part-level metadata. Use it to power legal ops, support, and compliance workflows where titles, categories, or headings often vary and users expect relevant results despite typos or naming drift.

Use cases

  • Contract discovery: Find agreements despite variations in naming
  • Author search: Match documents by author with name variations
  • Category navigation: Discover documents in related categories
  • Multi-field search: Combine multiple metadata fields for comprehensive matching
  • Data quality: Find documents with inconsistent metadata

Metadata Query request and response details

To perform a fuzzy metadata search, send a POST request to /v2/corpora/:corpus_key/metadata_query, where corpus_key specifies the unique identifier for the corpus. Fuzzy search requires the following parameters:

  • level: Specify document to return unique documents or part for multiple parts from the same document.
    Default: document
  • queries: One or more field-specific fuzzy queries.
    • field: Metadata field without the doc. or part. prefix; must be a TEXT filter attribute.
    • query: Text to match approximately.
    • weight: Increases the influence of a field on ranking.
      Default: 1.0
  • metadata_filter: Exact filter applied before fuzzy matching. This uses Vectara filter syntax with doc. / part. scoping.
  • Limit: The maximum number of results.
    Default: 10
    Range: 1-100
  • offset: The start value of the pagination. Default: 0

The response contains a documents object with results ordered by relevance score with full metadata.

  • doc_id: Document ID.
  • score: Relevance score.
  • metadata: Returned metadata for the match.
  • total_count: Total matches across all pages.
BASIC FUZZY SEARCH EXAMPLE
1

Example request: Fuzzy search with pre-filtering

FUZZY SEARCH WITH METADATA FILTER
1

Example Response

The response contains a documents object with results ordered by relevance score with full metadata.

  • doc_id: Document ID.
  • score: Relevance score.
  • metadata: Returned metadata array for the match.
  • total_count: Total matches across all pages.
EXAMPLE RESPONSE
1

The API returns standard HTTP error codes with detailed error information:

HTTP CodeError CodeDescription
400invalid_requestMalformed query or invalid field names
400invalid_metadata_filterFilter expression syntax error
401unauthorizedInvalid or missing API key
403forbiddenInsufficient permissions for corpus
404corpus_not_foundSpecified corpus does not exist
429rate_limit_exceededRequest rate limit exceeded

How Fuzzy Matching Works

  1. Automatic Application: Fuzzy matching is applied automatically to all field queries
  2. Typo Tolerance: Handles common typos, character transpositions, and missing characters
  3. Weighted Scoring: Field weights influence the final relevance score
  4. Two-Stage Process:
    • First: Apply exact metadata_filter to narrow results
    • Second: Perform fuzzy matching on remaining documents

Field weighting strategy

Adjust field weights to control search relevance:

  • Higher weights (2.0-3.0): Critical fields like title or primary identifier
  • Medium weights (1.0-1.5): Important supporting fields
  • Lower weights (0.5-1.0): Additional context fields

Example Weighting Strategy

STRATEGIC FIELD WEIGHTING
1

Error Responses

The API returns standard HTTP error codes with detailed error information:

HTTP CodeError CodeDescription
400invalid_requestMalformed query or invalid field names
400invalid_metadata_filterFilter expression syntax error
401unauthorizedInvalid or missing API key
403forbiddenInsufficient permissions for corpus
404corpus_not_foundSpecified corpus does not exist
429rate_limit_exceededRequest rate limit exceeded

Limitations

  • Currently limited to metadata fields (not full-text content)
  • Maximum 100 results per request
  • Fuzzy matching parameters are not user-configurable
  • Internal API status may limit availability