Skip to main content
Version: 2.0

Corpora

This guide covers the Vectara Python SDK for managing corpora, which are containers for storing documents and metadata in the Vectara platform. These methods support administrative tasks like creating, listing, updating, and deleting corpora, enabling you to organize data for search and Retrieval Augmented Generation (RAG) operations. This guide focuses on corpus management, not direct search or generation. For querying corpora (including RAG), see the corpora.search method in the Query API guide.

Prerequisites

This guide assumes you have a corpus called my-docs. If you haven't created a corpus yet, follow the Quick Start guide to set up your first corpus.

Create a corpus

CREATE A CORPUS
1

Set up a new corpus to serve as a centralized container for your organization's documents and metadata, enabling efficient search and RAG operations.

This section guides you through creating a corpus with a unique identifier and filter attributes, making it a foundational step for managing enterprise data.

The corpora.create method corresponds to the HTTP POST /v2/corpora endpoint. For more details on request and response parameters, see the Create Corpus REST API.

  • key (string, required): Unique identifier for the corpus, such as "my-docs". Must follow naming conventions (alphanumeric, underscores, hyphens).
  • name (string, optional): Human-readable name ("My Documentation"). Defaults to key value.
  • description (string, optional): Optional description of the corpus's purpose ("Contains project documentation").
  • filter_attributes (list, optional): List of metadata attributes for filtering queries.
    • name: Unique attribute name (max 64 characters)
    • level: "document" (document-level) or "part" (section-level)
    • type: "integer", "real", "text", or "boolean"
    • indexed: Performance optimization flag

Use descriptive key values to simplify querying. Filter attributes enable metadata-based search refinement. This method sets up the corpus structure but doesn't index documents.

Important

Filter attributes must be defined at corpus creation time and cannot be modified later. Plan your metadata schema carefully.

Error Handling:

  • 400 Bad Request: Invalid key or request body.
  • 403 Forbidden: Insufficient permissions.
  • 409 Conflict: Corpus with the same key already exists.

List corpora

LIST CORPORA
1?

Efficiently manage your account's corpora by retrieving a paginated list. This section helps you oversee multiple corpora, ensuring you can monitor and maintain your enterprise data infrastructure.

The corpora.list method corresponds to the HTTP GET /v2/corpora endpoint. For more details on request and response parameters, see the List Corpora REST API.

  • limit (integer, optional): Maximum number of corpora to return per page (default: 10).
  • page_key (string, optional): Token for pagination, returned in previous responses.

The method returns a paginated iterator that you can loop through directly. Use the iterator pattern to handle large numbers of corpora efficiently.

Error Handling:

  • 403 Forbidden: Insufficient permissions.

Retrieve a corpus

RETRIEVE A CORPUS
1?

Access detailed metadata for a specific corpus to gain insights into its configuration and usage. This section supports administrative tasks like auditing or verifying corpus settings, critical for enterprise data governance.

The corpora.get method corresponds to the HTTP GET /v2/corpora/{corpus_key} endpoint. For more details on request and response parameters, see the Get Corpus REST API.

  • corpus_key (string, required): Unique identifier of the corpus ("my-docs").

Returns a corpus object with complete metadata, including id, name, description, filter_attributes, and usage statistics.

Error Handling:

  • 403 Forbidden: Insufficient permissions.
  • 404 Not Found: Corpus doesn't exist.

Update a corpus

UPDATE A CORPUS
1

Modify an existing corpus's properties, such as its name or status, to adapt to changing business requirements. This section supports maintenance tasks like archiving or enabling/disabling corpora for operational flexibility.

The corpora.update method corresponds to the HTTP PATCH /v2/corpora/{corpus_key} endpoint. For more details on request and response parameters, see the Update Corpus REST API.

  • corpus_key (string, required): Unique identifier of the corpus.
  • name (string, optional): New name for the corpus.
  • description (string, optional): New description.
  • enabled (boolean, optional): Enable/disable the corpus for queries.

Note: Filter attributes cannot be modified after corpus creation. Only name, description, and enabled status can be updated.

Disabling a corpus (enabled=False) prevents new indexing but allows read-only queries. This is useful for archiving or maintenance scenarios.

Error Handling:

  • 403 Forbidden: Insufficient permissions.
  • 404 Not Found: Corpus doesn't exist.

Delete a corpus

DELETE A CORPUS
1?

Permanently remove a corpus and its data to manage storage and lifecycle effectively. This section is essential for enterprise data cleanup and compliance with retention policies.

The corpora.delete method corresponds to the HTTP DELETE /v2/corpora/{corpus_key} endpoint. For more details on request and response parameters, see the Delete Corpus REST API.

  • corpus_key (string, required): Unique identifier of the corpus.

Error Handling:

  • 403 Forbidden: Insufficient permissions.
  • 404 Not Found: Corpus doesn't exist.
caution

Deletion is irreversible. Ensure you have backups or consider using the enabled=False update option to disable rather than delete.


Differences from generation and query tasks

  • Generation tasks: Corpus management methods are administrative and do not directly support generation. For RAG or chat-based generation, use client.query() with generation parameters.
  • Query vs. prompt confusion: Vectara uses a retrieval-centric model with natural-language queries, not prompt-based interactions like many GenAI platforms. For prompt-like behavior, configure generation parameters in query methods.

Next steps

  • Explore document indexing with client.documents.create() using the Documents guide.
  • Learn about querying with client.query() for search and RAG using the Query guide.
  • Learn about metadata filtering with the Metadata guide.
  • Experiment with the Vectara Console to test endpoints before coding.