Version: 2.0

Vectara Data Egress Overview

This document provides a high-level overview of how to export and retrieve data from Vectara using the available API capabilities.

Vectara provides programmatic access to retrieve your data through REST APIs. While the platform is optimized for search and retrieval operations, it offers comprehensive methods to access your documents, configurations, and usage data. Data export requires iterating through individual resources as bulk export operations are not currently available.

What can be exported

Individual documents with metadata
Document lists for corpus inventory
Document parts (text chunks) with positioning metadata
Tables and images extracted during processing
Custom metadata associated with documents

What cannot be exported

Original uploaded files (PDFs, Word docs, and so on.)
Raw embeddings/vectors
System-generated indexes
Deleted content (permanently removed)

Data export best practices

API-based retrieval

The primary method for data export is through Vectara's REST API. The process involves the following steps:

Implementation Overview

The above three API calls can be used to systematically download all your indexed documents. Here's the general approach:

// Step 1: Get all corpora
  corpora := GET /v2/corpora

// Step 2 & 3: Iterate through each corpus and download documents
  for each corpus in corpora:
      documents := GET /v2/corpora/{corpus.key}/documents

      for each document in documents:
          document_content := GET /v2/corpora/{corpus.key}/documents/{document.id}
          // Save document_content locally

What can be exported​

What cannot be exported​

Data export best practices​

API-based retrieval​

Implementation Overview​

What can be exported

What cannot be exported

Data export best practices

API-based retrieval

Implementation Overview