Interpreting Scores
Like all information retrieval systems, scores documents on how relevant they are to the query. It's important to understand the scoring system and how it changes based on the controls and query parameters you have provided.
Out of the box scores in :
- Can be either positive or negative
- Are larger/more positive as relevance is increased
- Are between -1 and 1 when not reranked
- Can be any real number -- positive or negative -- when reranked. However, scores when reranked are typically between about -10 and 10
See the sections below on "standard" and "reranked" results for details on how they differ and how to use them best.
Vectara provides an important control that can affect scores: custom dimensions, which allow you to boost or bury results based on metadata.
Comparison With Keyword Systems
If you've used a keyword-based system (BM25, TF/IDF, etc) and are used to the scoring mechanics, it's worth discussing the differences so you can understand what to expect with .
Limiting number of results
Unlike keyword systems which only match documents that exactly match the term(s) that have been searched, attempts to produce scores for the majority of documents in a corpus -- even those that have low relevance to the given query. For some use cases, it's desirable to have as many pages of results as possible, but for others, you may wish to apply logic to limit the number of results that are shown in the website or application.
In general, you can safely stop showing results with scores below 0.1 in all cases and below around 2 when using the reranker for most use cases. A more robust strategy for limiting results is to look for a sudden drop in scores. For example, if the scores are [0.7, 0.6, 0.55, 0.5, 0.2, 0.14], the large drop from 0.5 to 0.2 means results have gotten significantly worse, and can be used as a signal to stop returning results.