Skip to main content
Version: 2.0

Metadata and Filtering

Metadata lets you tag documents and document parts with structured information, such as type, department, creation date, or custom business attributes.

With Vectara, metadata powers precise search, filtering, and vertical-specific retrievalβ€”enabling smarter RAG and analytics use cases.

Metadata is a dictionary of key-value pairs associated with each document or document part. You use metadata to:

  • Enable fast filtering ('doc.department = "finance"')
  • Control vertical-specific queries ('doc.type = "contract"')
  • Add business context (`part.customer_id, 'doc.location')
  • Support structured retrieval for complex applications
Prerequisites

This guide assumes you have a corpus called my-docs. If you haven't created a corpus yet, follow the Quick Start guide to set up your first corpus.

Create corpus with filter attributes​

Before you can filter by metadata, you must define filter attributes when creating your corpus. These attributes tell Vectara which metadata fields should be indexed for fast filtering.

CREATE CORPUS WITH FILTER ATTRIBUTES
1

Critical: Filter attributes must be defined at corpus creation time. You cannot add filter attributes to an existing corpus later.

Filter Attribute Parameters:

  • name (string): The metadata field name to make filterable
  • level (string): Either "document" or "part" depending on where metadata is attached
  • type (string): Data type - "text", "integer", "real", or "boolean"
  • indexed (boolean): Set to true for fast filtering performance

Add metadata at ingestion​

Add metadata when indexing documents using the Python SDK. You can set metadata at:

  • Document level (applies to the whole doc)
  • Part/Section level (applies to a section/part)

Example: Ingest a document with metadata​

INDEX A DOCUMENT WITH METADATA
1

Important: The metadata field names (department, year, doc_type, section_type) must exactly match the filter attribute names defined in your corpus.

Querying with metadata filters​

Filter your queries using metadata fields to target only relevant documents or parts.

  • Document-level filter: Applies to whole documents.
  • Part-level filter: Targets individual sections/parts based on their metadata.

Example: Query with a metadata filter​

QUERY BY METADATA
1

Example: Part-level metadata filtering​

FILTER BY SECTION METADATA
1

Example: Complex metadata filtering​

ADVANCED METADATA FILTERING
1
Filter Syntax
  • Filter syntax is similar to SQL. Use single quotes for strings.
  • Combine multiple conditions with AND or OR.
  • Use comparison operators: =, !=, >, >=, <, <=
  • Use IN for multiple values: doc.type IN ('policy', 'procedure')
  • You can only filter on metadata fields defined as filter attributes in your corpus.

Metadata best practices​

  • Plan filter fields: When creating a corpus, define which metadata keys should be indexed for filtering.
  • Use consistent types: Stick to string, number, or boolean values for predictable filtering.
  • Be explicit: Set metadata at both document and section level if your queries require fine-grained filtering.
  • Keep keys lowercase: Avoid spaces and special characters in metadata keys.
  • Match filter attributes: Ensure metadata field names exactly match the filter attribute names defined in your corpus.

Troubleshooting metadata filters​

The error INVALID_ARGUMENT: The filter expression contains an error. Unrecognized references: doc.department, doc.year occurs when:

  1. Filter attributes not defined: The corpus doesn't have filter attributes for the metadata fields you're trying to filter on.
  2. Name mismatch: The metadata field names don't exactly match the filter attribute names.
  3. Wrong level: Using doc. prefix for part-level attributes or vice versa.

Solutions:

  • Ensure filter attributes are defined when creating the corpus (cannot be added later)
  • Verify metadata field names exactly match filter attribute names
  • Use doc. prefix for document-level filters and part. for part-level filters
  • Check for typos and use single quotes for string values

Complete working example​

END-TO-END METADATA EXAMPLE
1

This complete example shows the proper workflow:

  1. Create corpus with filter attributes
  2. Index documents with matching metadata
  3. Query with metadata filters

The key is ensuring metadata field names exactly match the filter attribute names defined in your corpus.