Skip to main content
Version: 2.0

Filters

Metadata filters restrict search results to only the document chunks that match a specified logical expression. They act as a WHERE clause for your semantic search, ensuring only relevant, pre-qualified content is retrieved before summarization.

Using filters helps with:

  • Precision: Limit results based on attributes like status, author, or category.
  • Scope: Target specific document parts, such as only part.is_title = true.

Filter syntax

Filters are defined using a simple, SQL-like syntax in the metadata_filter field within the corpora object of your query.

  • Specify whether the metadata is at the document or part level.
    Example: doc., part.
  • Use logical and comparison operators.
    Example: AND, OR, doc.year > 2025, part.type IN ('A', 'B')
  • Ensure that the data type values match.
    Example: doc.status = 'Published', doc.price <= 50.0, part.is_title = true

Metadata use case examples

Metadata filters enable highly versatile and granular control over query results. This section provides real-world examples and use cases to illustrate how metadata filters can be applied to solve common business and technical challenges.

  • Language-specific filtering - In multilingual documents, different sections may be in different languages. Use part-level metadata to target specific language segments.

    doc.rating > 3 AND part.lang = 'de'

    The lang metadata tag used in this example is detected and set automatically by the platform at indexing time. It's set at the part level for accuracy, because a single document may contain content in multiple languages.

  • Date-specific document retrieval - More complicated expressions are possible, such as the one below, which checks for documents with a publication date in 2021.

    1609459200 < doc.pub_epoch AND doc.pub_epoch < 1640995200

    Here, pub_epoch stores the date in epoch time.

    You can find a full list of supported operations on the Functions and Operators page, and a full list of how to specify literals on Data Types.

  • Filter by document status - For auditing purposes, you may want to limit results to documents marked as Published instead of Draft:

    doc.status = 'Published'

  • Filter by custom tag - Custom metadata fields enable filtering based on business-specific criteria, such as priority, category, or internal tags.

    doc.priority = 'High' AND doc.category = 'Technology'

  • Filter by date range - Find documents published during a specific year (assuming pub_year is an Integer).

    doc.pub_year = 2023

  • Exclude drafts and authors - Find content that is not a Draft and was not written by a specific author.

    doc.status != 'Draft' AND NOT (doc.author = 'John Doe')

  • Part-level filtering (title) - Only retrieve chunks that are titles, or never retrieve titles.

    part.is_title = true

  • Multiple tags/values - Find documents that are tagged as either Science or History.

    doc.category IN ('Science', 'History')

Example query with a document-level filter

This example asks the question "What are the key benefits of cloud computing?" from the Cloud Computing References corpus. Within the corpora object, we specified a metadata_filter to filter though published documents with "metadata_filter": "doc.status = 'Published'",

METADATA EXAMPLE

Code example with json syntax.
1

Example response with a document-level filter

The example response returns documents with a "status": "Published", in the document metadata. This response also shows other metadata associated with each document_id.

RESPONSE EXAMPLE

Code example with json syntax.
1

Example query with part-level metadata

Now let's send a query with part-level meta for part.concept = 'Overview'.

We will only change the metadata_filter value from the previous example so that it filters for this part-level metadata:

METADATA EXAMPLE

Code example with json syntax.
1

Example response with part-level metadata

PART-LEVEL METADATA EXAMPLE

Code example with json syntax.
1