Retrieve a document
GET/v2/corpora/:corpus_key/documents/:document_id
Retrieve the content and metadata of a specific document, identified by its
unique document_id
from a specific corpus.
Request
Path Parameters
Possible values: <= 50 characters
, Value must match regular expression [a-zA-Z0-9_\=\-]+$
The unique key identifying the corpus containing the document to retrieve.
The document ID of the document to retrieve.
This document_id
must be percent encoded.
Header Parameters
Possible values: >= 1
The API will make a best effort to complete the request in the specified seconds or time out.
Possible values: >= 1
The API will make a best effort to complete the request in the specified milliseconds or time out.
Responses
- 200
- 403
- 404
Successfully retrieved the document.
- application/json
- Schema
- Example (from schema)
Schema
- Array [
- ]
- Array [
- ]
The document ID.
metadata object
The document metadata.
The document metadata.
tables object[]
The tables that this document contains. Tables are not available when table extraction is not enabled.
The unique ID of the table within the document.
The title of the table.
data object
The data of the table.
The headers of the table.
The rows in the data.
The description of the table.
parts object[]
Parts of the document that make up the document. However, parts are not available when retrieving a list of documents or when creating a document. This property is only available when retrieving a document by ID.
The text of the document part.
metadata object
The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.
The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.
The context text for the document part.
custom_dimensions object
The custom dimensions as additional weights.
storage_usage object
How much storage the document used. This information is currently not returned when retrieving the document, and only returned when indexing a document.
Number of bytes used by document counting towards maximum corpus size, and towards any billing plans.
Number of metadata bytes used by a document.
extraction_usage object
How much extraction quota the document used. This information is currently not returned when retrieving the document, and only returned when indexing a document.
The number of pages from the document that consumed the extraction quota.
{
"id": "my-doc-id",
"metadata": {},
"tables": [
{
"id": "table_1",
"title": "string",
"data": {
"headers": [
[
{
"text_value": "string",
"int_value": 0,
"float_value": 0,
"bool_value": true,
"colspan": 0,
"rowspan": 0
}
]
],
"rows": [
[
{
"text_value": "string",
"int_value": 0,
"float_value": 0,
"bool_value": true,
"colspan": 0,
"rowspan": 0
}
]
]
},
"description": "string"
}
],
"parts": [
{
"text": "I'm a nice document part.",
"metadata": {
"nice_rank": 9000
},
"context": "string",
"custom_dimensions": {}
}
],
"storage_usage": {
"bytes_used": 0,
"metadata_bytes_used": 0
},
"extraction_usage": {
"table_extraction_used": 0
}
}
Permissions do not allow retrieving a document from the corpus.
- application/json
- Schema
- Example (from schema)
Schema
The messages describing why the error occurred.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"messages": [
"Internal server error."
],
"request_id": "string"
}
Corpus or document not found.
- application/json
- Schema
- Example (from schema)
Schema
The ID cannot be found.
ID of the request that can be used to help Vectara support debug what went wrong.
{
"id": "string",
"messages": [
"string"
],
"request_id": "string"
}