Create Corpus API Definition
The Create Corpus API lets you create a corpus that contains specific properties and attributes.
Check out our API Playground that lets you experiment with this REST endpoint to create corpora.
Corpus Object
When you create a corpus
object, only the name
and description
fields are
mandatory.
The response message returns a unique id, corpus_id
, by which the corpus
can be subsequently referenced. Note that the name needn't be unique
within an account.
In order to reference metadata in filter expressions, the referenceable attributes must be declared at creation time in the filter attributes. This list cannot be changed once the corpus is created.
For information on custom dimensions, a Scale-only feature, please see Custom Dimensions. Like filter attributes, custom dimensions cannot be changed after the corpus is created.
Filter Attribute
A filter attribute must specify a name, and a level which indicates whether it exists in the document or part level metadata. At indexing time, metadata with this name will be extracted and made available for filter expressions to operate on.
If indexed is true, the system will build an index on the extracted values to further improve the performance of filter expressions involving the attribute.
Finally, filter attributes must specify a type, which is validated when documents are indexed. The four supported types are integer, which stores signed whole-number values up to eight bytes in length; real, for storing floating point values in IEEE 754 8-byte format; text for storing textual strings in UTF-8 encoding, and boolean for storing true/false values.
REST Example
Create Corpus REST Endpoint
Vectara exposes a REST endpoint at the following URL to create a corpus:https://api.vectara.io/v1/create-corpus
The API Playground shows the full Create Corpus REST definition.
gRPC Example
You can find the full Create Corpus gRPC definition at admin.proto.
The CreateCorpusRequest
message contains a Corpus message with the name,
description, and other customization options. The CreateCorpusResponse
provides the response with the new Corpus ID and status.