Summarize a document
POST/v2/corpora/:corpus_key/documents/:document_id/summarize
Organizations often struggle with extracting relevant insights from extensive documentation, such as vendor quotes, financial statements, and technical reports. Manually reviewing these documents is both time-consuming and prone to errors.
The tech preview of the Documentation Summarization API enables users to generate concise summaries that capture essential insights from single documents without having to process entire documents manually. Efficiently process large documents, extract key insights, and interact with real-time data summaries.
- Enable streaming for large documents to receive summaries incrementally.
- Customize
prompt_templateto fine-tune summary output for specific domains. - Use standard responses for small documents where streaming is unnecessary.
- Monitor streaming events to track the progress of real-time summarization.
The documentation length is limited by the context window of your selected LLM.
Response formats
The API supports two response modes:
- Standard: Provides a complete summary in one response.
- Streaming Provides incremental responses using Server-Sent Events (SSE).
Non-streaming response
In standard mode, the API returns a structured response containing the complete summary of the document. The summary field contains the generated text, enabling users to extract essential information quickly.
Streaming response
For streaming responses, the API returns Server-Sent Events (SSE). The first event begins streaming partial results as soon as they are available, while the final event marks the end of the summarization process.
The streamed response consists of multiple events:
generation_info: Contains therendered_promptwhich is the compiled prompt sent to the LLM for document summarization.generation_chunk: Returns partial chunk of the generated summary.generation_end: Marks the completion of the summary generation.error: Returns an error message if summarization fails.end: Indicates the end of the streaming session.
Prompt template example
When crafting a prompt, you can access your document with the $vectaraDocument field. This example shows a simple prompt:
{
"role": "user",
"content": "Summarize the document: \$vectaraDocument.json()"
}
The document also has the following methods to support custom prompts.
$vectaraDocument.json(): Provides a JSON representation of the whole document.$vectaraDocument.id(): Specifies the unique identifier of the document (document_id)$vectaraDocument.metadata(): Specifies metadata from the document.
For example,$vectaraDocument.metadata().get("key")retrieves a specific metadata value by key.$vectaraDocument.parts(): Returns an array of document parts which you can look through.
For example,#foreach ($part in $vectaraDocument.parts()).$part.text(): Retrieves the text of the part.$part.metadata(): Retrieves metadata of a part.$part.hasTable(): Determines if the part contains a table.$part.table(): Provides access to the table within the part. For example, use$part.table().json()to retrieve the table in JSON format.
Request
Responses
- 200
- 403
- 404
Document summarization response on success.
Permissions do not allow summarizing a document in the corpus.
Corpus or document not found.