Skip to main content

Reading Metadata

In Vectara, when you index a document, it will consist of both a top-level Document object and a series of Sections which can be nested. Both can contain separate metadata, including some metadata that Vectara will auto-generate. A good example of this is that you could have a document which has some global attributes like the URL or owner but individual sections will have a section attribute and a lang.

Here's an example response with different metadata at these different levels:

{
"responseSet": [
{
"response": [
{
"text": "Answer to the Ultimate Question of Life, the Universe, and Everything, is 42.",
"score": 0.1401531994342804,
"documentIndex": 0,
"corpusKey": {
"customerId": 0,
"corpusId": 1234,
"semantics": 0,
"metadataFilter": "",
"dim": []
},
"metadata": [
{
"name": "speaker",
"value": "Deep Thought"
},
{
"name": "lang",
"value": "eng"
},
{
"name": "section",
"value": "2"
},
{
"name": "offset",
"value": "316"
}
]
},
{
"text": "Sometimes the questions are complicated and the answers are simple.",
"score": 0.13511724770069122,
"documentIndex": 1,
"corpusKey": {
"customerId": 0,
"corpusId": 1234,
"semantics": 0,
"metadataFilter": "",
"dim": []
},
"metadata": [
{
"name": "lang",
"value": "eng"
},
{
"name": "section",
"value": "17"
},
{
"name": "offset",
"value": "171"
}
]
},
],
"status": [],
"document": [
{
"id": "hitchhikers-guide",
"metadata": [
{
"name": "author",
"value": "Douglas Adams"
},
{
"name": "publicationyear",
"value": "1979"
}
]
},
{
"id": "authors-quotes",
"metadata": [
{
"name": "author",
"value": "Dr. Seuss"
}
]
}
]
}
],
"status": []
}

Within a given item in the responseSet array, you'll see there's a response and a document section (among others). The response section holds section-level metadata and the document section holds document-level metadata. The reason for this split is that there may be multiple sections from the same document in the response, and this allows for deduplication of document-level metadata, which can reduce the total time for the response.

Combining document and section metadata

In order to display metadata for a particular section, you may want to combine it with the document-level metadata. To do so, look at the documentIndex value. This tells you which index into the document array you should grab associated metadata from.

For example, the first result in the response array ("Answer to the Ultimate Question of Life, the Universe, and Everything, is 42.") has a documentIndex value of 0 and has metadata of speaker:Deep Thought, lang:eng, section:2, and offset:316. These are the section-level metadata for this result.

Because the documentIndex is 0, we look at the first result in the document array to find the document-level metadata and document ID. In this case, the id is hitchhikers-guide and the document-level metadata is author:Douglas Adams and publicationyear:1979.

Depending on your use case, you might want to combine these metadata elements together for display purposes.

Filtering

You can also use the document- and section-level metadata to filter in a search operation. For more information on how to apply filter expressions at either the document or section/part level, please see the filter expression documentation.