List the documents in the corpus
GET/v2/corpora/:corpus_key/documents
Retrieve a list of documents stored in a specifi corpus. This endpoint provides an overview of document metadata without returning the full content of each document.
Request
Path Parameters
Possible values: <= 50 characters
, Value must match regular expression [a-zA-Z0-9_\=\-]+$
The unique key identifying the queried corpus.
Query Parameters
Possible values: >= 1
and <= 100
Default value: 10
The maximum number of documents to return at one time.
Filter documents by metadata. Uses the same expression as a query metadata filter, but only allows filtering on document metadata.
Used to retrieve the next page of documents after the limit has been reached.
Header Parameters
Possible values: >= 1
The API will make a best effort to complete the request in the specified seconds or time out.
Possible values: >= 1
The API will make a best effort to complete the request in the specified milliseconds or time out.
Responses
- 200
- 403
- 404
List of documents.
- application/json
- Schema
- Example (from schema)
Schema
- Array [
- Array [
- ]
- ]
documents object[]
List of documents.
The document ID.
metadata object
The document metadata.
The document metadata.
parts object[]
Parts of the document that make up the document. However, parts are not available when retrieving a list of documents or when creating a document. This property is only available when retrieving a document by ID.
The text of the document part.
metadata object
The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.
The metadata for a document part. These may be used in metadata filters at query time if filter attributes are configured on the corpus.
The context text for the document part.
custom_dimensions object
The custom dimensions as additional weights.
storage_usage object
How much storage the document used. This information is currently not returned when retrieving the document, and only returned when indexing a document.
Number of bytes used by document counting towards maximum corpus size, and towards any billing plans.
Number of metadata bytes used by a document.
metadata object
The standard metadata in the response of a list operation.
When requesting the next page of this list, this is needed as a query parameter.
{
"documents": [
{
"id": "my-doc-id",
"metadata": {},
"parts": [
{
"text": "I'm a nice document part.",
"metadata": {
"nice_rank": 9000
},
"context": "string",
"custom_dimensions": {}
}
],
"storage_usage": {
"bytes_used": 0,
"metadata_bytes_used": 0
}
}
],
"metadata": {
"page_key": "string"
}
}
Permissions do not allow listing documents in the corpus.
- application/json
- Schema
- Example (from schema)
Schema
The messages describing why the error occurred.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"messages": [
"Internal server error."
],
"request_id": "string"
}
Corpus not found.
- application/json
- Schema
- Example (from schema)
Schema
The ID cannot be found.
ID of the request that can be used to help Vectara support debug what went wrong.
{
"id": "string",
"messages": [
"string"
],
"request_id": "string"
}