Skip to main content

Default Metadata Filters

A few pieces of metadata expressions filterable out of the box, as they're very useful in a variety of situations.

Note that you can set up additional fields to filter on by setting up filter attributes on a corpus.

doc.id field

Each document is assigneed a unique identifier at indexing. You can use the doc.id field to retrieve or filter specific documents in your corpus.

Valid filter expressions include something like:

  • doc.id = 'my-document-2023.pdf'
  • doc.id = 'my-document-2022.pdf' OR 'my-document-2023.pdf'
  • doc.id = 'my-document-2023.pdf' AND 'my-document-2024.pdf'

part.lang field

Each section of a document is evaluated for its language at index time and the part.lang field is added with a 3-character lower-case language code (ISO 639-2). For example, if the section was detected as English, then part.lang would contain eng and if it was detected as German, than part.lang would contain deu.

Valid filter expressions for this would be something like:

  • part.lang = 'eng'
  • part.lang = 'deu'
  • part.lang = 'eng' OR part.lang = 'deu'

part.is_title field

When adding content, Vectara will add a special Boolean field to indicate whether the field is a title field or not. This is useful for a few different cases depending on how you model your data. For example, some users want to only match on a title field or never match on a title field, in which case this field can be used to filter.

To filter for title fields only, you can use: part.is_title = true and conversely part.is_title = false will return only non-title sections.