Voyageai refactoring #412

fzowl · 2025-10-24T11:45:42Z

Voyageai refactoring:

contextual model(s)
remove default model value
token counting and effective batching
test with new reranker, rerank-2.5 as well
more tests

- contextual model - removing the model default value - token counting, ie. more effective use of batches

jit-ci · 2025-10-24T11:45:49Z

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

- contextual model - removing the model default value - token counting, ie. more effective use of batches

Copilot

Pull Request Overview

This PR refactors the VoyageAI integration to support contextual embedding models, improve token management, and enhance testing coverage. The changes include removing the default model value (requiring explicit model specification), implementing token-aware batching, adding support for the voyage-context-3 model, and updating tests to cover the new rerank-2.5 model.

Key Changes:

Added token counting functionality and token-aware batching based on model-specific token limits
Introduced support for contextualized embedding models (voyage-context-3) with automatic API endpoint detection
Updated VoyageAI package dependency from 0.2.2 to 0.3.5

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
tests/integration/test_vectorizers.py	Added VoyageAI-specific tests for token counting, context models, and batching; updated model references to voyage-3.5
tests/integration/test_rerankers.py	Enhanced reranker test fixture to test both rerank-lite-1 and rerank-2.5 models
redisvl/utils/vectorize/text/voyageai.py	Implemented token-aware batching, context model support, removed default model value, added token limits dictionary
pyproject.toml	Updated voyageai dependency version requirement from 0.2.2 to 0.3.5

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-24T15:25:42Z

redisvl/utils/vectorize/text/voyageai.py

    def __init__(
        self,
-        model: str = "voyage-large-2",
+        model: str,


Removing the default model value is a breaking API change. Existing code that instantiates VoyageAITextVectorizer without specifying a model will fail. Consider adding a deprecation warning in a previous release or documenting this as a breaking change in the release notes.

Copilot · 2025-10-24T15:25:42Z

redisvl/utils/vectorize/text/voyageai.py

+        # Tokenize all texts in one API call for efficiency
+        try:
+            token_counts = self.count_tokens(texts)
+        except Exception as e:
+            raise ValueError(f"Failed to count tokens for batching: {e}")


The token-aware batching makes an additional API call to count_tokens() for every embed operation. This adds latency and API usage overhead. Consider adding a parameter to allow users to opt out of token-aware batching or implement local tokenization if the VoyageAI client supports it.

count_token() is a low-latency local function.

Copilot · 2025-10-24T15:25:42Z

redisvl/utils/vectorize/text/voyageai.py

            texts: List of texts to embed
-            batch_size: Number of texts to process in each API call
-            **kwargs: Additional parameters to pass to the VoyageAI API
+            batch_size: Deprecated. Token-aware batching is now always used.


The batch_size parameter is marked as deprecated in the docstring, but the method signature still accepts it without any deprecation warning being raised. Consider adding a runtime deprecation warning when batch_size is provided, or remove the parameter entirely if it's no longer used.

bsbodden · 2025-10-24T15:26:51Z

@fzowl There are some failures in CI: The job failed due to mypy type errors in redisvl/utils/vectorize/text/voyageai.py:

Line 302: "Client" has no attribute "contextualized_embed"
Line 385: "AsyncClient" has no attribute "contextualized_embed"

Solution:
The mypy errors are caused by missing type annotations for contextualized_embed on the Client and AsyncClient classes from the voyageai package. To resolve this:

Ensure the voyageai package is up-to-date, as contextualized_embed may be a recent addition. Run:
```
pip install --upgrade voyageai
```

If the method exists in runtime but mypy still complains, add type ignore comments for the relevant lines:

response = self._client.contextualized_embed(  # type: ignore[attr-defined]
    inputs=[batch],
    model=self.model,
    input_type=input_type,
    **kwargs,
)

and

response = await self._aclient.contextualized_embed(  # type: ignore[attr-defined]
    inputs=[batch],
    model=self.model,
    input_type=input_type,
    **kwargs,
)

Alternatively, update your project's type stubs for voyageai or create a custom stub that includes contextualized_embed in both Client and AsyncClient.

This will resolve the mypy errors and allow the type checks to pass.

Relevant file: redisvl/utils/vectorize/text/voyageai.py (ref: 90340c52bfc47b4944b2095e4e1b492b80bf2507)

fzowl added 2 commits October 24, 2025 12:44

Remiving the multimodal code piece

63dc6b2

VoyageAI refactoring:

6e23658

- contextual model - removing the model default value - token counting, ie. more effective use of batches

VoyageAI refactoring:

90340c5

- contextual model - removing the model default value - token counting, ie. more effective use of batches

bsbodden requested review from bsbodden and Copilot October 24, 2025 15:24

Copilot AI reviewed Oct 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Voyageai refactoring #412

Voyageai refactoring #412

fzowl commented Oct 24, 2025 •

edited

Loading

Uh oh!

jit-ci bot commented Oct 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

fzowl Oct 26, 2025

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

bsbodden commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Voyageai refactoring #412

Are you sure you want to change the base?

Voyageai refactoring #412

Conversation

fzowl commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jit-ci bot commented Oct 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

fzowl Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

bsbodden commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fzowl commented Oct 24, 2025 •

edited

Loading