Skip to content

Conversation

@amihos
Copy link
Contributor

@amihos amihos commented Nov 19, 2025

Problem

When configuring OpenMemory with an external embedding provider (Gemini, OpenAI, etc.), there's a critical configuration mismatch that causes semantic search to completely fail:

  • Storage uses embedMultiSector which respects OM_EMBED_MODE → uses external provider (768d vectors)
  • Query uses embedForSector which respects OM_TIER → defaults to "hybrid" (256d synthetic vectors)

This vector space mismatch results in all similarity scores being ~0, making semantic search non-functional.

Root Cause Analysis

The issue occurs because:

  1. embedMultiSector (storage) checks env.embed_mode to decide embedding method
  2. embedForSector (query) checks tier variable to decide embedding method
  3. Default OM_TIER=hybrid causes queries to use gen_syn_emb() (synthetic)
  4. But with OM_EMBEDDINGS=gemini, storage uses actual Gemini embeddings

These are incompatible vector spaces - cosine similarity between them produces essentially random/zero results.

Solution

This PR adds:

  1. Startup warning when configuration mismatch is detected - warns users to set OM_TIER=deep
  2. Model upgrade - Updates Gemini from deprecated embedding-001 to text-embedding-004

Example Warning Output

[CONFIG] ⚠️  WARNING: Embedding configuration mismatch detected!
         OM_EMBEDDINGS=gemini but OM_TIER=hybrid
         Storage will use gemini embeddings, but queries will use synthetic embeddings.
         This causes semantic search to fail. Set OM_TIER=deep to fix.

Testing

Verified fix by:

  1. Setting OM_TIER=deep with OM_EMBEDDINGS=gemini
  2. Storing test memory with unique keywords
  3. Querying for those keywords - memory now correctly appears as top result with high similarity score

Alternative Approaches Considered

  1. Make embedForSector respect OM_EMBED_MODE - More invasive change, might break intentional hybrid setups
  2. Change default OM_TIER - Breaking change for existing users

The warning approach is safest and most backwards-compatible while clearly informing users of the issue.

🤖 Generated with Claude Code

When OM_EMBEDDINGS is set to an external provider (gemini, openai, etc.)
but OM_TIER is left at default (hybrid or fast), storage uses external
embeddings while queries use synthetic embeddings. This causes vector
space mismatch resulting in all similarity scores being ~0.

Changes:
- Add startup warning when configuration mismatch is detected
- Update Gemini model from deprecated embedding-001 to text-embedding-004
- Update models.yml with latest Gemini model references

The warning instructs users to set OM_TIER=deep when using external
embedding providers for consistent vector spaces.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@nullure
Copy link
Member

nullure commented Nov 20, 2025

Hey amihos, thanks for your interest in OpenMemory. I appreciate your pull request but it has conflicts with a file. Please resolve it

amihos and others added 2 commits November 20, 2025 21:06
Resolved conflicts in models.yml by keeping both changes:
- Updated Gemini model to text-embedding-004 (latest)
- Added AWS Bedrock embedding support (amazon.titan-embed-text-v2:0)

All sectors now support all providers: ollama, openai, gemini, aws, local

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fixed TypeScript build errors by properly adding AWS_REGION,
AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY properties to
the env configuration object. These properties were being
referenced in embed.ts but were missing from cfg.ts.

Changes:
- Removed incorrect aws_model property with 3-argument str() call
- Added AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  as individual properties with proper default values

Fixes TypeScript errors:
- cfg.ts(53,9): Expected 2 arguments, but got 3
- embed.ts: Property 'AWS_*' does not exist errors

This is the same fix applied to PR CaviraOSS#54.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
amihos added a commit to amihos/OpenMemory that referenced this pull request Nov 20, 2025
Fixed TypeScript build errors by properly adding AWS_REGION,
AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY properties to
the env configuration object. These properties were being
referenced in embed.ts but were missing from cfg.ts.

Changes:
- Removed incorrect aws_model property with 3-argument str() call
- Added AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  as individual properties with proper default values

Fixes TypeScript errors:
- cfg.ts(53,9): Expected 2 arguments, but got 3
- embed.ts: Property 'AWS_*' does not exist errors

This is the same fix applied to PRs CaviraOSS#54 and CaviraOSS#55.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link
Member

@nullure nullure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nullure nullure merged commit 9513eaf into CaviraOSS:main Nov 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants