Filters
Metadata filters restrict search results to only the document chunks that match a specified logical expression. They act as a WHERE clause for your semantic search, ensuring only relevant, pre-qualified content is retrieved before summarization.
Using filters helps with:
- Precision: Limit results based on attributes like status, author, or category.
- Scope: Target specific document parts, such as only part.is_title = true.
Filter syntax
Filters are defined using a simple, SQL-like syntax in the metadata_filter
field within the corpora object of your query.
- Specify whether the metadata is at the document or part level.
Example:doc.,part. - Use logical and comparison operators.
Example:AND,OR,doc.year > 2025,part.type IN ('A', 'B') - Ensure that the data type values match.
Example:doc.status = 'Published',doc.price <= 50.0,part.is_title = true
Metadata use case examples
Metadata filters enable highly versatile and granular control over query results. This section provides real-world examples and use cases to illustrate how metadata filters can be applied to solve common business and technical challenges.
-
Language-specific filtering - In multilingual documents, different sections may be in different languages. Use part-level metadata to target specific language segments.
doc.rating > 3 AND part.lang = 'de'The
langmetadata tag used in this example is detected and set automatically by the platform at indexing time. It's set at the part level for accuracy, because a single document may contain content in multiple languages. -
Date-specific document retrieval - More complicated expressions are possible, such as the one below, which checks for documents with a publication date in 2021.
1609459200 < doc.pub_epoch AND doc.pub_epoch < 1640995200Here,
pub_epochstores the date in epoch time.You can find a full list of supported operations on the Functions and Operators page, and a full list of how to specify literals on Data Types.
-
Filter by document status - For auditing purposes, you may want to limit results to documents marked as
Publishedinstead ofDraft:doc.status = 'Published' -
Filter by custom tag - Custom metadata fields enable filtering based on business-specific criteria, such as priority, category, or internal tags.
doc.priority = 'High' AND doc.category = 'Technology' -
Filter by date range - Find documents published during a specific year (assuming
pub_yearis an Integer).doc.pub_year = 2023 -
Exclude drafts and authors - Find content that is not a Draft and was not written by a specific author.
doc.status != 'Draft' AND NOT (doc.author = 'John Doe') -
Part-level filtering (title) - Only retrieve chunks that are titles, or never retrieve titles.
part.is_title = true -
Multiple tags/values - Find documents that are tagged as either Science or History.
doc.category IN ('Science', 'History')
Example query with a document-level filter
This example asks the question "What are the key benefits of cloud computing?"
from the Cloud Computing References corpus. Within the corpora object, we
specified a metadata_filter to filter though published documents with
"metadata_filter": "doc.status = 'Published'",
METADATA EXAMPLE
Code example with json syntax.1
Example response with a document-level filter
The example response returns documents with a "status": "Published", in the document
metadata. This response also shows other metadata associated with each document_id.
RESPONSE EXAMPLE
Code example with json syntax.1
Example query with part-level metadata
Now let's send a query with part-level meta for part.concept = 'Overview'.
We will only change the metadata_filter value from the previous example so
that it filters for this part-level metadata:
METADATA EXAMPLE
Code example with json syntax.1
Example response with part-level metadata
PART-LEVEL METADATA EXAMPLE
Code example with json syntax.1