List Documents API Definition
The List Documents API enables you to retrieve a list of documents stored in a specific corpus. This endpoint provides an overview of document metadata, including document IDs, tables (if table extraction is enabled), and pagination details. This is useful for viewing documents indexed so far and helping you decide to remove documents that are no longer needed. It helps you manage the document lifecycle in your environment.
This information enables you to catalog and inventory large amounts of data while also extracting lists of documents for further analysis. For example, developers can utilize the metadata to to build custom search and filtering capabilities into their applications. If you enabled tabled extraction, this endpoint also returns the tables that this document contains.
Currently Document Admin APIs do not allow you to access the text of your documents.
List Documents Request and Response
To list documents, send a GET request to /v2/corpora/:corpus_key/documents
,
where corpus_key
is the unique identifier for the corpus. You can specify
optional query parameters to control the pagination of the results.
limit
- Indicates the maximum number of documents to return in a single request, with a default value of10
and a maximum value of100
.metadata_filter
- Filters documents on document metadata by accepting a metadata filter expression similar to those used in query operations, but limited to document-level metadata.page_key
- Retrieves the next page of results when the previous request has reached the limit.
The response contains an array of document
objects with the matching
document IDs, metadata, tables, parts, storage usage, and metadata about the
pagination.
REST 2.0 URL
List Documents REST Endpoint Address
Vectara exposes a REST endpoint at the following URL to list documents:https://api.vectara.io/v2/corpora/:corpus_key/documents
The API Reference shows the full List Documents REST definition.
gRPC Example
You can find the full List Documents gRPC definition at list_documents.proto.