List the documents in the corpus
GET/v2/corpora/:corpus_key/documents
Retrieve a list of documents stored in a specific corpus. This endpoint provides an overview of document metadata without returning the full content of each document.
Request
Path Parameters
Possible values: <= 50 characters
, Value must match regular expression [a-zA-Z0-9_\=\-]+$
The unique key identifying the queried corpus.
Query Parameters
Possible values: >= 1
and <= 100
Default value: 10
The maximum number of documents to return at one time.
Filter documents by metadata. Uses the same expression as a query metadata filter, but only allows filtering on document metadata.
Used to retrieve the next page of documents after the limit has been reached.
Header Parameters
Possible values: >= 1
The API will make a best effort to complete the request in the specified seconds or time out.
Possible values: >= 1
The API will make a best effort to complete the request in the specified milliseconds or time out.
Responses
- 200
- 403
- 404
List of documents.
- application/json
- Schema
- Example (from schema)
Schema
- Array [
- Array [
- ]
- Array [
- ]
- ]
documents object[]
List of documents.
The document ID.
metadata object
The document metadata.
The document metadata.
tables object[]
The tables that this document contains. Tables are not available when table extraction is not enabled.
The unique ID of the table within the document.
The title of the table.
data object
The data of the table.
The headers of the table.
The rows in the data.
The description of the table.
parts object[]
Parts of the document that make up the document. However, parts are not available when retrieving a list of documents or when creating a document. This property is only available when retrieving a document by ID.
The text of the document part.
metadata object
The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.
The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.
The context text for the document part.
custom_dimensions object
The custom dimensions as additional weights.
storage_usage object
How much storage the document used. This information is currently not returned when retrieving the document, and only returned when indexing a document.
Number of bytes used by document counting towards maximum corpus size, and towards any billing plans.
Number of metadata bytes used by a document.
extraction_usage object
How much extraction quota the document used. This information is currently not returned when retrieving the document, and only returned when indexing a document.
The number of pages from the document that consumed the extraction quota.
metadata object
The standard metadata in the response of a list operation.
When requesting the next page of this list, this is needed as a query parameter.
{
"documents": [
{
"id": "my-doc-id",
"metadata": {},
"tables": [
{
"id": "table_1",
"title": "string",
"data": {
"headers": [
[
{
"text_value": "string",
"int_value": 0,
"float_value": 0,
"bool_value": true,
"colspan": 0,
"rowspan": 0
}
]
],
"rows": [
[
{
"text_value": "string",
"int_value": 0,
"float_value": 0,
"bool_value": true,
"colspan": 0,
"rowspan": 0
}
]
]
},
"description": "string"
}
],
"parts": [
{
"text": "I'm a nice document part.",
"metadata": {
"nice_rank": 9000
},
"context": "string",
"custom_dimensions": {}
}
],
"storage_usage": {
"bytes_used": 0,
"metadata_bytes_used": 0
},
"extraction_usage": {
"table_extraction_used": 0
}
}
],
"metadata": {
"page_key": "string"
}
}
Permissions do not allow listing documents in the corpus.
- application/json
- Schema
- Example (from schema)
Schema
The messages describing why the error occurred.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"messages": [
"Internal server error."
],
"request_id": "string"
}
Corpus not found.
- application/json
- Schema
- Example (from schema)
Schema
The ID cannot be found.
ID of the request that can be used to help Vectara support debug what went wrong.
{
"id": "string",
"messages": [
"string"
],
"request_id": "string"
}