Skip to main content
Version: 2.0

Reading Metadata

In Vectara, when you index a document, the document has a type parameter that determines the format of the document as core or structured. The core type has document_parts and the structured type has sections. Both can be nested and both can contain separate metadata, including some metadata that Vectara will auto-generate. A good example of this is that you could have a document which has some global attributes like the URL or owner but individual sections will have a section attribute and a lang.

Here's an example response with different metadata at these different levels:

{
"search_results": [
{
"text": "Answer to the Ultimate Question of Life, the Universe, and Everything, is 42.",
"score": 0.1401531994342804,
"part_metadata": {
"speaker": "Deep Thought",
"lang": "eng",
"section": "2",
"offset": "316"
},
"document_metadata": {
"author": "Douglas Adams",
"publicationyear": "1979"
},
"document_id": "hitchhikers-guide",
"request_corpora_index": 0
},
{
"text": "Sometimes the questions are complicated and the answers are simple.",
"score": 0.13511724770069122,
"part_metadata": {
"lang": "eng",
"section": "17",
"offset": "171"
},
"document_metadata": {
"author": "Dr. Seuss"
},
"document_id": "authors-quotes",
"request_corpora_index": 0
}
]
}

Within a given item in the search_results array, you'll see there's a part_metadata and a document_metadata section (among others). The part_metadata section holds section-level metadata and the document_metadata section holds document-level metadata. The reason for this split is that there may be multiple sections from the same document in the response, and this allows for deduplication of document-level metadata, which can reduce the total time for the response.

Combining document and section metadata

In order to display metadata for a particular section, you may want to combine it with the document-level metadata. To do so, look at the document_id value. This tells you which document the metadata belongs to.

For example, the first result in the search_results array ("Answer to the Ultimate Question of Life, the Universe, and Everything, is 42.") has a document_id value of hitchhikers-guide and has a part_metadata of speaker:Deep Thought, lang:eng, section:2, and offset:316. These are the section-level metadata for this result.

Because the document_id is hitchhikers-guide, we look at the first result in the search_results array to find the document-level metadata and document ID. In this case, the id is hitchhikers-guide and the document-level metadata is author:Douglas Adams and publicationyear:1979.

Depending on your use case, you might want to combine these metadata elements together for display purposes.

Filtering

You can also use the document- and section-level metadata to filter in a search operation. For more information on how to apply filter expressions at either the document or section/part level, please see the filter expression documentation.