Create a corpus
POST/v2/corpora
Create a corpus, which is a container to store documents and associated metadata. Here, you
define the unique corpus_key
that identifies the corpus. The corpus_key
can be custom-defined
following your preferred naming convention, allowing you to easily manage the corpus's data and
reference it in queries. For more information, see
Corpus Key Definition.
Request
Header Parameters
Possible values: >= 1
The API will make a best effort to complete the request in the specified seconds or time out.
Possible values: >= 1
The API will make a best effort to complete the request in the specified milliseconds or time out.
- application/json
Body
- Array [
- ]
- Array [
- ]
Possible values: <= 50 characters
, Value must match regular expression [a-zA-Z0-9_\=\-]+$
A user-provided key for a corpus.
The name for the corpus. This value defaults to the key.
Description of the corpus.
Default value: false
Queries made to this corpus are considered answers, and not questions.
Default value: false
Documents inside this corpus are considered questions, and not answers.
Possible values: Value must match regular expression enc_[0-9]+$
Deprecated: Use encoder_name
instead.
The encoder used by the corpus, boomerang-2023-q3
.
filter_attributes object[]
The new filter attributes of the corpus. If unset then the corpus will not have filter attributes.
The JSON path of the filter attribute in a document or document part metadata.
Possible values: [document
, part
]
Indicates whether this is a document or document part metadata filter.
Description of the filter. May be omitted.
Default value: true
Indicates whether an index should be created for the filter. Creating an index will improve query latency when using the filter.
Possible values: [integer
, real_number
, text
, boolean
, list[integer]
, list[real_number]
, list[text]
]
The value type of the filter.
custom_dimensions object[]
A custom dimension is an additional numerical field attached to a document part. You can then multiply this numerical field with a query time custom dimension of the same name. This allows boosting (or burying) document parts for arbitrary reasons. This feature is only enabled for Pro and Enterprise customers.
The name of the custom dimension.
Description of the custom dimension.
Default value of a custom dimension on a document part if the custom dimension value is not specified when the document part is indexed.
A value of 0 means that custom dimension is not considered.
Default value of a custom dimension for a query if the value of the custom dimension is not specified when querying the corpus.
A value of 0 means that custom dimension is not considered.
Responses
- 201
- 400
- 403
The corpus has been created.
- application/json
- Schema
- Example (from schema)
Schema
- Array [
- ]
- Array [
- ]
Possible values: Value must match regular expression crp_[0-9]+$
Vectara ID of the corpus.
Possible values: <= 50 characters
, Value must match regular expression [a-zA-Z0-9_\=\-]+$
A user-provided key for a corpus.
Name for the corpus. This value defaults to the key.
Corpus description.
Specifies whether the corpus is enabled or not.
Indicates that this corpus does not store documents and stores chats instead.
Default value: false
Queries made to this corpus are considered answers, and not questions. This swaps the semantics of the encoder used at query time.
Default value: false
Documents inside this corpus are considered questions, and not answers. This swaps the semantics of the encoder used at indexing.
Possible values: Value must match regular expression enc_[0-9]+$
The encoder used by the corpus.
Deprecated: Use encoder_name
instead
The encoder used by the corpus, boomerang-2023-q3
.
filter_attributes object[]
The new filter attributes of the corpus.
The JSON path of the filter attribute in a document or document part metadata.
Possible values: [document
, part
]
Indicates whether this is a document or document part metadata filter.
Description of the filter. May be omitted.
Default value: true
Indicates whether an index should be created for the filter. Creating an index will improve query latency when using the filter.
Possible values: [integer
, real_number
, text
, boolean
, list[integer]
, list[real_number]
, list[text]
]
The value type of the filter.
custom_dimensions object[]
The custom dimensions of all document parts inside the corpus.
The name of the custom dimension.
Description of the custom dimension.
Default value of a custom dimension on a document part if the custom dimension value is not specified when the document part is indexed.
A value of 0 means that custom dimension is not considered.
Default value of a custom dimension for a query if the value of the custom dimension is not specified when querying the corpus.
A value of 0 means that custom dimension is not considered.
limits object
The number of documents contained in the corpus.
The number of document parts contained in the corpus.
NOTE: This field is currently not populated by the system. The number of bytes contained in the corpus. This includes the document metadata, document part metadata, and document contents.
The number of characters contained in the corpus. This includes the document metadata, document part metadata, and document contents.
NOTE: This field is currently not populated by the system. The maximum number of bytes the corpus can be.
The maximum size that metadata can be on documents.
NOTE: This field is currently not populated by the system. The maximum per-second addition of new documents to corpus.
Indicates when the corpus was created.
{
"id": "string",
"key": "my-corpus",
"name": "string",
"description": "string",
"enabled": true,
"chat_history_corpus": true,
"queries_are_answers": false,
"documents_are_questions": false,
"encoder_name": "boomerang-2023-q3",
"filter_attributes": [
{
"name": "Title",
"level": "document",
"description": "The title of the document.",
"indexed": true,
"type": "text"
}
],
"custom_dimensions": [
{
"name": "importance",
"description": "Product importance.",
"indexing_default": 0,
"querying_default": 0
}
],
"limits": {
"used_docs": 0,
"used_parts": 0,
"used_bytes": 0,
"used_characters": 0,
"max_bytes": 0,
"max_metadata_bytes": 0,
"index_rate": 0
},
"created_at": "2025-01-14T03:49:21.135Z"
}
Invalid request body in the create corpus request.
- application/json
- Schema
- Example (from schema)
Schema
field_errors object
The errors that relate to specific fields in the request.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"field_errors": {},
"messages": [
"string"
],
"request_id": "string"
}
Permissions do not allow creating a corpus.
- application/json
- Schema
- Example (from schema)
Schema
The messages describing why the error occurred.
The ID of the request that can be used to help Vectara support debug what went wrong.
{
"messages": [
"Internal server error."
],
"request_id": "string"
}