Skip to main content

Low-level Indexing API Definition

The Low-level Indexing API provides low-level access to the semantic indexing capabilities of the Vectara platform. It focuses on document parts which allow for specific text and context definitions within a document. This approach differs from the Standard Indexing API which organizes documents into sections that have IDs, titles, and descriptions, like traditional, hierarchical document structures.

This more granular control over documents enables you to tailor your indexing strategies. The Low-level Indexing API is reserved for advanced use cases and normal users should use the Standard Indexing API.

tip

Check out our interactive API Playground that lets you experiment with this endpoint to index documents from your browser.

Low-level Index Document Request and Response

The low-level indexing service accepts individual documents or messages to be indexed. In a short period of time, generally a few minutes, the new content becomes available in the search index. This index request requires the following parameters:

  • Customer ID
  • Corpus ID
  • Document object

The response includes a status message and a StorageQuota message indicating how much quota was consumed.

Document Container Definition

The document object contains the related textual items that are indexed. This object has a document_id, which must be unique among all the documents in the same corpus. It may optionally define metadata_json.

The two fields default_part_context and custom_dims (Scale only) provide default values for the corresponding sub-document fields, should they fail to define either of these explicitly.

Parts within a Document

Most importantly, parts defines the actual text items that you want to index. The document part is the atomic unit of Vectara. Every part is added to the index, and when search results are returned, each result is a document part.

The text field defines the text and should generally be a sentence. It should not be shorter, but may be longer, up to the length of an entire paragraph, although performance may suffer.

The context defines the context of the text. It may include any additional textual information that helps in disambiguating the meaning. For instance, it may include the preceding or following paragraphs, the chapter title, or the document title.

The part metadata, held in metadata_json, is returned with the document part in search query results. For example, it can contain information that links the item to records in other systems.

For Scale users, custom_dims allows you to specify additional factors that can be used at query time to control the ranking of results. The dimensions must be defined ahead of time for the corpus, or else they'll be ignored.

REST Example

Low-level Indexing REST Endpoint

Vectara exposes a REST endpoint at the following URL to index content into a corpus:
https://api.vectara.io/v1/core-index

The API Playground shows the full Low-level Indexing REST definition.

gRPC Example

You can find the full Low-level Indexing gRPC definition at indexing_core.proto.

A request to add data into a corpus consists of three key pieces of information: the customer ID, the corpus ID, and the data itself, represented as a CoreDocument message.

The reply from the server consists of nothing yet. Note that the reply does not block. In other words, the information in the request is not yet available in the index when the RPC returns.

The full definition also shows the CoreDocument container format, which has metadata about the document, and parts within the document as CoreDocumentPart.