Queries
This guide covers the Vectara Python SDK for querying corpora, enabling search and Retrieval Augmented Generation (RAG) operations. These methods enable you to search corpora for relevant documents and generate summarized responses using Vectara's RAG-focused LLMs, supporting enterprise needs like legal research or customer insights.
This guide assumes you have a corpus called my-docs
with indexed documents and filter attributes
defined. If you haven't created a corpus yet, follow the Quick Start guide
to set up your first corpus and add some documents.
Prerequisites
1
Setup Requirements:
- Install the SDK with
pip install vectara
. - Get an API key from the Vectara Console.
- Create a corpus with
client.corpora.create()
.
Initialize the Vectara Client
1
Set up authentication to securely access querying methods using an API key. Ensure your API key has querying permissions for the target corpora.
Simple query with generation
1
Perform a query with Retrieval Augmented Generation (RAG) to get both search results and an AI-generated summary. This is the most common pattern for getting comprehensive answers from your corpus.
The client.query
method corresponds to the HTTP POST /v2/query
endpoint. For
more details on request and response parameters, see the
Query REST API.
Key Parameters:
generation_preset_name
:vectara-summary-ext-24-05-med-omni
provides high-quality, comprehensive responses using GPT-4o.
See Generation Presets for a list of currently supported prompts.max_used_search_results
: 50 results ensures the LLM has substantial context for generation.enable_factual_consistency_score
: Provides confidence score for the generated summary.
Returns:
summary
: AI-generated summary based on search resultsfactual_consistency_score
: Reliability score (0.0-1.0) for the summarysearch_results
: List of relevant documents with scores
Use this pattern when you need both specific document excerpts and a synthesized answer.
Advanced query with filtering and reranking
1
Execute sophisticated queries with metadata filtering, reranking, and custom generation prompts for specialized use cases.
Advanced Features:
- Metadata Filtering: Use
doc.field = 'value'
syntax to filter by document properties (requires corpus filter attributes) - Lexical Interpolation: 0.3 balances keyword matching (30%) with semantic search (70%)
- Context Configuration: Adds surrounding sentences for better understanding
- Reranking: Improves result relevance using specialized models
- Custom Prompts: Tailor AI responses for specific domains or formats
Important: Metadata filtering requires that your corpus has filter attributes defined for the fields you want to filter on. See the Corpus guide for creating filter attributes.
Streaming query
1
Stream query responses in real-time for better user experience in interactive applications like chatbots or live search interfaces.
The client.query_stream
method corresponds to the HTTP POST /v2/query_stream
endpoint.
Streaming Benefits:
- Immediate feedback to users as content generates
- Better perceived performance for long responses
- Ability to stop generation early if needed
Use Cases:
- Interactive chat interfaces
- Live search suggestions
- Long-form content generation where users want to see progress
Error handling and best practices
Common Error Scenarios:
1
Best Practices:
- Always use try-catch blocks for production queries
- Monitor factual consistency scores for quality control
- Start with simple queries before adding advanced features
- Use appropriate
max_used_search_results
(50 for comprehensive, 10-20 for fast responses) - Ensure corpus has filter attributes before using metadata filters
Performance Tips:
- Cache frequently used search configurations
- Use streaming for long responses
- Consider pagination for very large result sets
- Monitor query latency and adjust parameters accordingly
Metadata Filtering Requirements:
- Filter attributes must be defined when creating the corpus
- Metadata field names must exactly match filter attribute names
- Use
doc.
prefix for document-level andpart.
for part-level filters
Next steps
After understanding queries, explore:
- Chat sessions: Use
client.chats.create()
for conversational interfaces with the Chats guide - Metadata filtering: Learn advanced filtering techniques with the Metadata guide
- Batch processing: Process multiple queries efficiently
- Custom rerankers: Train domain-specific reranking models
- Advanced analytics: Track query performance and user patterns