Client-configurable data retention
When you create a corpus via the API or the UI, you will have the option to create it and "don't store the text," also known as a "textless" mode.
Textless Mode
When you enable textless mode, several things happen in the platform.
Let's look at when it's appropriate to enable textless, what happens on the
platform, and what benefits and limitations it brings.
What happens when textless is enabled
When you enable textless
on a corpus, Vectara discards
the text content of the document immediately after it converts the text to a
vector. At that point, the text is no longer recoverable and won't be
returned in any Vectara APIs.
Note that Vectara does retain any metadata -- including document IDs -- that were supplied alongside the text. This allows you to retrieve the document from a separate system of record based on the ID to show it and also allows Vectara to perform any metadata-based filtering on the document.
Textless mode use cases
One use case for textless
is when you have very sensitive text content. By
enabling this feature, the text content becomes unrecoverable
to Vectara or to any user that manages to query for and
find the document.
In general, this feature is optimal for use cases where the cost of any information leakage is very high. Note that Vectara does encrypt documents
Limitations
Currently, the reranking capability relies on the text being stored. As a result, attempting to rerank search results on any corpora where text storage has been turned off will not work at this time.