Vector Stores

This guide explains how to create and configure vector store connections in Raikoo. Vector stores enable your AI agents to perform Retrieval-Augmented Generation (RAG) by searching through document collections to find relevant information.

Understanding Vector Stores in Raikoo

Vector stores in Raikoo are organization-level connections to external vector databases. They provide:

Centralized Configuration - Define connections once and use them across multiple agents and workflows
Secure Credential Storage - Connection credentials are stored in Azure Key Vault, never in the application database
Knowledge Base Integration - Enable agents to access document collections for RAG operations
Multiple Provider Support - Connect to a wide variety of vector database platforms

Vector store connections are shared across all projects within an organization, making them ideal for knowledge bases that serve multiple teams and use cases.

Supported Vector Store Providers

Raikoo supports the following vector database providers:

Provider	Configuration Fields	Description
Astra	`endpoint`, `token`	DataStax Astra vector database service
Chroma	`host`, `port`, `ssl`, `tenantId`, `database`, `authType`, `username`, `password`, `token`	Open-source embedding database with flexible authentication
Milvus	`address`, `ssl`, `username`, `password`	High-performance vector database with SSL support
MongoDB Atlas	`username`, `password`, `host`, `port`, `directConnection`	MongoDB Atlas with vector search capabilities
pgvector	`authType`, `host`, `port`, `database`, `username`, `password`, `connectionString`, `ssl`, `dimensions`, `idColumnName`, `collectionColumnName`, `vectorColumnName`, `contentColumnName`, `metadataColumnName`, `distanceStrategy`	PostgreSQL extension for vector similarity search
Pinecone	`apiKey`, `indexName`, `indexHostUrl`	Managed vector database service
Qdrant	`url`, `port`, `apiKey`	Open-source vector similarity search engine
Supabase	`url`, `key`	PostgreSQL-based platform with vector capabilities
Weaviate	`clusterUrl`, `apiKey`, `indexName`	Open-source vector search engine with semantic capabilities

Provider-Specific Details

Astra (DataStax)

endpoint: The endpoint URL of your Astra instance
token: Authentication token for your Astra instance

Chroma

host: Host address of the Chroma server
port: Port number (default: 8000)
ssl: Whether to use SSL/HTTPS for connections
tenantId: Tenant name in the Chroma server
database: Database name to connect to
authType: Authentication type (basic or token)
username: Username for basic authentication (optional)
password: Password for basic authentication (optional)
token: Authentication token for token-based auth (optional)

Milvus

address: Endpoint URL address of your Milvus instance
ssl: Whether to use SSL/HTTPS for connections
username: Username to authenticate to Milvus
password: Password to authenticate to Milvus

MongoDB Atlas

username: Username to connect to Mongo Atlas
password: Password associated with the user
host: Endpoint URL address of your Mongo Atlas instance
port: Port number where Mongo Atlas is running (default: 27017)
directConnection: Whether to force dispatch all operations to the specified host

pgvector (PostgreSQL)

authType: Authentication type (values or connection-string)
username: Username to connect to Postgres (required for values authType)
password: Password associated with the user (required for values authType)
host: Endpoint URL address of your Postgres instance (required for values authType)
port: Port number where Postgres is running (default: 5432, required for values authType)
database: Database name to connect to (required for values authType)
connectionString: PostgreSQL connection string (required for connection-string authType)
ssl: Whether to use SSL/HTTPS for connections
dimensions: Number of dimensions in the vector embeddings
idColumnName: ID column name (default: "id")
collectionColumnName: Collection column name (default: "collection")
vectorColumnName: Vector column name (default: "vector")
contentColumnName: Content column name (default: "content")
metadataColumnName: Metadata column name (default: "metadata")
distanceStrategy: Distance calculation method (cosine, innerProduct, or euclidean)

Pinecone

apiKey: API key to connect to your Pinecone instance
indexName: Name of the index to use
indexHostUrl: URL of the index host (optional)

Qdrant

url: URL of the Qdrant instance
port: Port of the Qdrant instance (default: 6333, optional)
apiKey: API key to connect to Qdrant (optional)

Supabase

url: URL of the Supabase instance
key: Key to connect to your Supabase instance

Weaviate

clusterUrl: URL of your Weaviate cluster
apiKey: API key to connect to your Weaviate cluster
indexName: Class name (collection/index) to use in Weaviate

Accessing the Vector Stores Page

To access vector stores in your organization:

Navigate to your organization's dashboard
Click on "Vector Stores" in the left navigation menu under Organization Settings
You'll see a list of all vector store connections configured for your organization

The Vector Stores page displays:

Connection name - A descriptive identifier for each vector store
Description - Optional details about the vector store's contents and purpose
Action buttons - Edit, test connection, or delete

Permissions Required

You need the appropriate organization permissions to view, create, edit, or delete vector store connections.

Creating a New Vector Store Connection

To create a new vector store connection:

From the Vector Stores page, click the "Create" or "Add New Vector Store" button
You'll be taken to the vector store editor

Configuring Basic Details

Name (required)
- Enter a descriptive name for your vector store connection
- Example: Product Documentation or Support Knowledge Base
Description (optional)
- Provide additional context about this vector store
- Describe its contents, purpose, or usage guidelines
- Example: "Product documentation and API references for customer support agents"
Known Collections (optional)
- Add collection names that exist in this vector store
- Collections are logical groupings of documents within the vector store
- You can add multiple collections by typing each name and pressing Enter
- This helps users select the appropriate collection when configuring agent tools
Type (required)
- Select the type of vector database
- Choose from: Astra, Chroma, Milvus, Mongo Atlas, PGVector, Pinecone, Qdrant, Supabase, or Weaviate

Naming Best Practice

Use clear, descriptive names that indicate the content and purpose, such as "Product_Docs" or "Support_Articles". This helps team members quickly identify the right knowledge base for their agents.

Configuring Provider-Specific Settings

After selecting the vector store type, you'll see provider-specific configuration fields. Enter the connection details for your vector database:

Fill in all required fields marked with an asterisk
Sensitive fields (passwords, tokens, API keys) will be hidden in the UI
Refer to the Supported Vector Store Providers section above for field descriptions

Secure Credentials

All sensitive configuration values (passwords, tokens, API keys) are stored securely in Azure Key Vault and are never displayed in plain text after initial entry.

Managing Known Collections

Collections are logical groupings of documents within a vector store. In most vector databases, collections serve as:

Document Namespaces - Separate different knowledge domains (e.g., "api-docs", "support-articles", "product-guides")
Query Targets - Agents query specific collections to focus on relevant information
Data Organization - Group related documents for better retrieval performance

Adding Known Collections

When creating or editing a vector store, you can add known collection names:

In the "Known Collections" field, type the collection name
Press Enter to add it to the list
Repeat for additional collections
These collection names will appear in dropdowns when configuring agent tools

Collection Discovery

Adding known collections is optional but recommended. It helps users select the correct collection when configuring tools, reducing configuration errors.

Using Vector Stores with Agents

Once you've configured a vector store connection, you can enable agents to query it using RAG tools. Raikoo provides two specialized tools for vector store operations:

vector_store_query Tool

The vector_store_query tool enables agents to perform direct queries against a vector store collection.

How it works:

Agent receives a user question or task requiring external knowledge
Agent formulates a search query and calls the vector_store_query tool
The query is converted to an embedding vector using the configured embedding model
Vector database performs similarity search to find relevant document chunks
Retrieved content is summarized and returned to the agent
Agent uses the information to respond to the user

Configuration parameters:

Vector Store & Collection Name (required) - Select the vector store and specify the collection to query
Embedding Model (required) - The embedding model used to create query vectors. Must match the model used to embed documents in the collection
Language Model (optional) - The model used to summarize retrieved content. Defaults to the agent's primary model
Retrieval Result Count (optional) - Number of document chunks to retrieve per query. Default: 10
Show Citations (optional) - Whether to include source citations in the response. Default: True

Tool parameters:

query (required) - The search query to find relevant information
goal (optional) - The broader objective behind the query. Helps preserve relevant context during summarization
depth (optional) - Response depth level:
- concise (default) - Brief, focused summary for simple lookups
- comprehensive - Detailed response with full context and nuances
- verbatim - Raw retrieved content with minimal synthesis

Example usage:

User: "What are the authentication requirements for our API?"

Agent: Uses vector_store_query tool with query="API authentication requirements"

Agent: "Based on the documentation, the API requires JWT bearer token authentication. Tokens must be included in the Authorization header..."

vector_store_researcher Tool

The vector_store_researcher tool provides an autonomous sub-agent that performs multi-query research and synthesizes comprehensive findings.

How it works:

Agent receives a complex research goal requiring thorough investigation
Agent calls the vector_store_researcher tool with the research goal
The researcher sub-agent autonomously:
- Formulates multiple related queries
- Executes each query against the vector store
- Analyzes results and identifies information gaps
- Performs follow-up queries as needed
- Synthesizes all findings into a comprehensive report
Final research report is returned to the calling agent with citations

Configuration parameters:

Vector Store & Collection Name (required) - Select the vector store and collection to research
Embedding Model (required) - The embedding model for generating query vectors
Researcher Model (required) - The model that orchestrates research and synthesizes findings. Use a capable model for best results
Query Summarization Model (optional) - The model for summarizing individual query results. Can be a faster/cheaper model
Retrieval Result Count (optional) - Number of chunks to retrieve per query. Default: 10
Maximum Queries (optional) - Maximum number of queries before final synthesis. Default: 10
Show Citations (optional) - Whether to include citations in query results. Default: True

Tool parameters:

goal (required) - The research goal or question to investigate. Be specific about what you want to learn
depth (optional) - Depth of individual query results:
- comprehensive (default, recommended) - Detailed responses with full context
- concise - Brief summaries for quick overviews
- verbatim - Raw retrieved content for exact source material

Example usage:

User: "Provide a comprehensive analysis of all security features in our platform"

Agent: Uses vector_store_researcher tool with goal="comprehensive analysis of platform security features"

Agent: "Based on research across the documentation, here's a comprehensive security analysis..."

Report includes findings from multiple queries about authentication, authorization, encryption, audit logging, etc., with citations

Adding Vector Store Tools to Agents

To add vector store tools to your agent:

Navigate to your agent's configuration
Go to the Tools section
Click "Add Tool"
Select either "Vector Store Query" or "Vector Store Researcher"
Configure the tool:
- Select your vector store connection
- Specify the collection name (or leave empty to prompt the agent)
- Configure the embedding model (must match the embeddings in your collection)
- Set optional parameters like retrieval count and language model
Save the agent configuration

Embedding Model Consistency

The embedding model configured for the tool must match the model used to create embeddings in the target collection. Mismatched models will produce poor retrieval results.

Document Ingestion Overview

Documents are ingested into vector stores through a multi-step process:

Content Processing - Parse source documents (PDFs, web pages, text files) and extract text content
Chunking - Segment content into smaller pieces (typically 200-800 tokens) for optimal retrieval
Embedding Generation - Convert each chunk to a vector embedding using an embedding model
Metadata Enrichment (optional) - Add contextual information to chunks using content enrichment
Storage - Persist vectors with associated text and metadata in the vector database

Chunking Strategy

Document chunking is critical for RAG performance:

Chunk Size - Balance between context and precision. Smaller chunks provide precise retrieval but may lack context; larger chunks preserve context but consume more tokens
Overlap - Chunks can overlap (typically 20-40%) to preserve context across boundaries
Deduplication - Multiple strategies available to prevent duplicate content:
- Content-based: Same content = same ID (deduplication across all files)
- Position-based: Same file + position = same ID (document-level deduplication)
- Random: No deduplication (always insert new chunks)

Content Enrichment

Content enrichment enhances chunks with additional context:

Situating Context - Add high-level context about the document or section
Metadata Tags - Attach structured attributes for filtering and categorization
Source Attribution - Include document source, timestamps, and version information

This enrichment improves retrieval accuracy by providing additional semantic context for similarity matching.

Security and Best Practices

Credential Management

Secure Storage - All sensitive credentials are stored in Azure Key Vault
Hidden Values - Passwords, tokens, and API keys are never displayed in the UI after initial entry
Minimal Permissions - Configure database users with only the permissions needed for vector store operations

Embedding Consistency

Single Model per Collection - All documents in a collection must use the same embedding model
Version Tracking - Track embedding model versions and migrate collections when upgrading
Model Selection - Choose embedding models appropriate for your content type and language

Collection Organization

Domain Separation - Use separate collections for different knowledge domains (e.g., product docs, support articles, API references)
Metadata Strategy - Design comprehensive metadata schemas for filtering and categorization
Content Versioning - Implement versioning to track document updates

Performance Optimization

Index Size - Balance collection size with query performance requirements
Retrieval Count - Optimize the number of results retrieved based on context window constraints
Caching - Implement caching for frequently accessed queries
Rate Limiting - Configure appropriate rate limits for embedding generation during ingestion

Troubleshooting Common Issues

Connection Test Failures

Symptoms: Connection test fails with an error message

Common causes and solutions:

Error	Likely Cause	Solution
Connection refused	Wrong host/port or firewall blocking	Verify host/port, check firewall rules
Authentication failed	Wrong credentials	Double-check username/password/API key
SSL/TLS error	SSL configuration mismatch	Verify SSL settings match database requirements
Timeout	Network latency or database not responding	Check network connectivity, verify database is running

Poor Retrieval Results

Symptoms: Agents retrieve irrelevant information

Solutions:

Verify the embedding model configured for the tool matches the model used to embed documents
Check that the correct collection is being queried
Increase the retrieval count to retrieve more candidates
Review chunk size and overlap settings in the ingestion process
Ensure documents contain relevant information for the queries

Missing or Incomplete Data

Symptoms: Expected documents are not found in queries

Solutions:

Verify documents were successfully ingested into the collection
Check that the collection name is spelled correctly
Review ingestion logs for errors
Confirm the vector store connection is working

Next Steps

Now that you've configured vector store connections, you can:

Add RAG tools to agents - Enable agents to query your knowledge bases
Ingest documents - Populate collections with your documentation and content
Monitor performance - Track query performance and retrieval quality
Expand knowledge bases - Add more collections and vector stores as your needs grow

Vector stores provide a powerful foundation for knowledge-augmented AI capabilities. By properly configuring connections and following best practices, you can enable your agents to access and leverage your organization's documentation and knowledge bases effectively.