Skip to main content

Create Corpus API Definition

The Create Corpus API lets you create a corpus that contains specific properties and attributes.

tip

Check out our API Playground that lets you experiment with this REST endpoint to create corpora.

Corpus Object

When you create a corpus object, only the name and description fields are mandatory.

The response message returns a unique id, corpus_id, by which the corpus can be subsequently referenced. Note that the name needn't be unique within an account.

In order to reference metadata in filter expressions, the referenceable attributes must be declared at creation time in the filter attributes. This list cannot be changed once the corpus is created.

For information on custom dimensions, a Scale-only feature, please see Custom Dimensions. Like filter attributes, custom dimensions cannot be changed after the corpus is created.

Filter Attribute

A filter attribute must specify a name, and a level which indicates whether it exists in the document or part level metadata. At indexing time, metadata with this name will be extracted and made available for filter expressions to operate on.

If indexed is true, the system will build an index on the extracted values to further improve the performance of filter expressions involving the attribute.

Finally, filter attributes must specify a type, which is validated when documents are indexed. The four supported types are integer, which stores signed whole-number values up to eight bytes in length; real, for storing floating point values in IEEE 754 8-byte format; text for storing textual strings in UTF-8 encoding, and boolean for storing true/false values.

REST Example

Create Corpus REST Endpoint

Vectara exposes a REST endpoint at the following URL to create a corpus:
https://api.vectara.io/v1/create-corpus

The API Playground shows the full Create Corpus REST definition.

gRPC Example

You can find the full Create Corpus gRPC definition at admin.proto.

The CreateCorpusRequest message contains a Corpus message with the name, description, and other customization options. The CreateCorpusResponse provides the response with the new Corpus ID and status.