Support for Alternative Vector Databases in Codebase Indexing

### What specific problem does this solve?

**Problem:** Users are limited to Qdrant as the only vector database option for codebase indexing, creating several issues:

**Who is affected:** All users wanting to use codebase indexing, especially:
- Individual developers who want a zero-configuration solution
- Teams with existing vector database infrastructure (ChromaDB, Pinecone, etc.)
- Users on resource-constrained systems who can't run additional services
- Enterprise users with specific compliance/infrastructure requirements

**When this happens:**
- During initial setup of codebase indexing
- When trying to integrate with existing ML/AI pipelines
- When deploying on resource-limited environments
- When corporate policies restrict certain database choices

**Current behavior:** 
- Users MUST set up and maintain a Qdrant instance (Docker or cloud)
- No way to reuse existing vector DB infrastructure
- Requires additional resources and configuration
- Some users reported issues with Qdrant setup (Issue #4441)

**Expected behavior:**
- Users can choose from multiple vector database options
- Support for embedded databases that require no separate service
- Ability to use existing infrastructure

**Impact:**
- Setup time: 30-60 minutes for Qdrant vs 5 minutes for embedded solutions
- Resource usage: Additional 500MB+ RAM for Qdrant service
- Barrier to adoption: Many users skip codebase indexing due to setup complexity
- Infrastructure costs: Unnecessary duplication for teams with existing vector DBs


### Additional context (optional)

- Discussion #411 shows strong community interest in alternatives, specifically LanceDB
- Continue.dev successfully uses LanceDB for the same use case
- Community member created workarounds (https://github.com/OJamals/Modal) showing demand
- Blog post demonstrating LanceDB for code RAG: https://blog.lancedb.com/rag-codebase-1/


### Roo Code Task Links (Optional)

N/A

### Request checklist

- [x] I've searched existing Issues and Discussions for duplicates
- [x] This describes a specific problem with clear impact and context

### Interested in implementing this?

- [x] Yes, I'd like to help implement this feature

### Implementation requirements

- [x] I understand this needs approval before implementation begins

### How should this be solved? (REQUIRED if contributing, optional otherwise)

**Solution: Implement a vector database adapter pattern**

1. **Create abstract interface:**
   - Define `VectorDBAdapter` base class with standard methods
   - Methods: create_index, add_embeddings, search, update, delete
   - Consistent error handling and response formats

2. **Implement adapters for priority databases:**
   - **LanceDB**: Embedded, no server needed, proven in Continue.dev
   - **ChromaDB**: Popular choice, good Python integration
   - **SQLite+Vector**: Minimal dependencies using sqlite-vss
   - **Qdrant**: Keep existing implementation as one option

3. **Configuration approach:**
   - Add `vector_db_provider` setting in config
   - Provider-specific settings in nested config object
   - Auto-detection of available providers on startup

4. **User interaction:**
   - Dropdown in settings to select vector DB
   - Provider-specific configuration fields appear dynamically
   - Clear setup instructions for each provider
   - Migration tool for switching between providers


### How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Given I have codebase indexing enabled
When I select "LanceDB" as my vector database
Then indexing works without requiring external services
And search results are comparable to Qdrant implementation
And switching between providers preserves my indexed data
But performance doesn't degrade significantly

Given I have an existing ChromaDB instance
When I configure codebase indexing to use it
Then it connects to my existing database
And creates collections without affecting other data
And respects my existing authentication settings

Given I'm using SQLite vector extension
When I index a large codebase (10k+ files)
Then indexing completes successfully
And search queries return in under 2 seconds
But I get a warning if codebase size might impact performance

### Technical considerations (REQUIRED if contributing, optional otherwise)

**Implementation approach:**
- Factory pattern for creating appropriate adapter instances
- Async/await support for all database operations
- Consistent embedding dimension handling across providers
- Batch processing for efficient indexing

**Architecture changes:**
- New `vector_db/` module with adapter implementations
- Modify `CodebaseIndex` class to use adapters
- Update configuration schema and validation

**Dependencies:**
- LanceDB: `pip install lancedb` (lightweight)
- ChromaDB: `pip install chromadb` (includes dependencies)
- SQLite: `pip install sqlite-vss` (minimal)

**Testing strategy:**
- Unit tests for each adapter with mocked databases
- Integration tests with real databases in CI
- Performance benchmarks comparing providers


### Trade-offs and risks (REQUIRED if contributing, optional otherwise)

**Alternatives considered:**
1. MCP server approach - Too complex for users, requires additional setup
2. External indexing service - Loses tight integration benefits
3. Supporting only one alternative - Doesn't solve the flexibility problem

**Risks:**
- **Maintenance burden**: Each adapter needs updates when APIs change
  - Mitigation: Start with 2-3 most requested options
- **Performance variations**: Different DBs have different performance characteristics
  - Mitigation: Clear documentation on use cases for each
- **Migration complexity**: Moving between providers could be challenging
  - Mitigation: Build migration tool from the start
- **Testing complexity**: Need to test multiple database backends
  - Mitigation: Shared test suite with provider-specific fixtures

**Breaking changes:**
- Configuration format will change (but can auto-migrate)
- Existing Qdrant indexes remain compatible

**Edge cases:**
- Very large codebases might not work well with SQLite
- Embedding dimension mismatches between providers
- Network issues with cloud providers (Pinecone)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support for Alternative Vector Databases in Codebase Indexing #6223

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support for Alternative Vector Databases in Codebase Indexing #6223

Description

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions