feat: add VectorStoreWriter tool #451

tyaroshko · 2025-10-28T16:37:39Z

Important

Add VectorStoreWriter and PreprocessTool for document processing and storage, with example workflows.

Tools:
- Add VectorStoreWriter in writer.py for writing documents to vector stores with metadata support.
- Add PreprocessTool in preprocess_tool.py for splitting documents into smaller parts.
Examples:
- Add agent_vector_store_write_pipeline.py to demonstrate a workflow using VectorStoreWriter and PreprocessTool.
- Add agent_vector_store_writer_tool.py to demonstrate a workflow using VectorStoreWriter for travel data.

^{This description was created by}^{for a5b38a2. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed dac6373 in 2 minutes and 30 seconds. Click for details.

Reviewed 429 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. dynamiq/nodes/agents/agent.py:760

Draft comment:
Nice integration of automatic ContextManagerTool injection via a model_validator. Consider adding more detailed logging (or raising a warning) when the tool cannot be added to aid in debugging edge cases where summarization is enabled but the tool isn’t present.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. dynamiq/nodes/tools/context_manager.py:140

Draft comment:
The ContextManagerTool implementation is clean and its prompt template is well structured. Consider adding robustness against empty or failed LLM summarization (e.g. when the returned summary is empty) to improve error handling.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. examples/components/core/dag/agent_file_storage.yaml:36

Draft comment:
Max_loops has been increased to 15 and summarization_config.enabled is set to true. Verify that the extended loop count and summarization do not adversely affect performance or lead to unintended context truncation.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

4. examples/use_cases/agents_use_cases/agent_deep_scraping.py:24

Draft comment:
The deep scraping use-case now integrates multiple tools (ScaleSerpTool and E2BInterpreterTool) and properly configures file storage via FileStoreConfig. Ensure that the combined use of multiple tools and summarization produces the expected CSV output and that the file storage integration works as intended.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

5. examples/use_cases/agents_use_cases/agent_deep_scraping.py:21

Draft comment:
Typographical error found: "usefull" should be "useful".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While this is technically correct and in changed code, spelling errors in string literals that will only be seen by the LLM are extremely low impact. The LLM will understand the meaning regardless of the spelling. This seems like an overly pedantic comment that doesn't meaningfully improve the code quality. The spelling error could propagate to user-facing logs or documentation. Also, maintaining consistent spelling throughout the codebase is good practice. This is an internal prompt string that will only be processed by the LLM. The misspelling won't affect functionality or user experience in any way. Delete this comment as it's too minor and doesn't affect functionality. We should focus review comments on more impactful issues.

Workflow ID: wflow_Oxw2bo07NCUJStk2

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to e1086b4 in 3 minutes and 28 seconds. Click for details.

Reviewed 281 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. dynamiq/nodes/writers/writer.py:90

Draft comment:
Ensure that 'is_postponed_component_init' exists on document_embedder and document_writer to avoid attribute errors.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% Looking at the code, document_embedder and document_writer are typed as DocumentEmbedder and Writer respectively. Without seeing those base classes, I can't be 100% sure if is_postponed_component_init is guaranteed to exist. However, this seems like an internal API where these classes are expected to have this attribute. The suggestion is overly defensive programming. I don't have visibility into the DocumentEmbedder and Writer base classes - what if some implementations don't have this attribute? The suggestion could prevent runtime errors. The types are clearly defined and this appears to be an internal API. If the attribute was optional, it would likely be reflected in the type system. Over-defensive programming can make code less readable. The comment suggests an overly defensive programming pattern that isn't necessary given the typed nature of the code. The current implementation is cleaner and relies on proper interface implementation.

2. dynamiq/nodes/writers/writer.py:138

Draft comment:
Merging kwargs with 'parent_run_id' from 'run_id' may yield None if 'run_id' is absent. Confirm this behavior is intended.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The code is handling the None case gracefully using dict.get(). Having parent_run_id be None when run_id is absent seems like a reasonable default behavior. The comment is asking for confirmation of intended behavior rather than pointing out an actual issue. This falls under the rule about not asking authors to confirm their intentions. Maybe there could be downstream code that requires parent_run_id to be non-None? Maybe this is part of a larger tracing system where None values cause problems? If parent_run_id was required to be non-None, the code would validate that or use a different default. The code is clearly written to handle the None case, and the comment is just speculating about intentions. Delete the comment. It's asking for confirmation of intentions rather than pointing out an actual issue, and the code handles the None case appropriately.

3. dynamiq/nodes/writers/writer.py:144

Draft comment:
Overwriting _run_depends with a new dependency list may discard earlier steps. Verify if accumulating dependencies is needed.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The code shows a clear pattern of resetting dependencies at start and updating after each step. The dependencies appear to be used for tracing execution flow. Since each step depends only on the immediately previous step, overwriting makes sense - we don't need to accumulate all previous steps. The reset_run_state() method also confirms this is the intended behavior. I could be wrong about the tracing requirements - maybe there's a need to track the full chain of dependencies that I'm not seeing. The code structure strongly suggests sequential dependency tracking is intended - reset at start, update after each step. If full dependency chains were needed, the reset_run_state() method would likely not exist. The comment should be deleted. The current dependency tracking approach appears intentional and appropriate for sequential execution tracing.

4. examples/components/agents/agents/agent_vector_store_writer_tool.py:27

Draft comment:
Passing 'is_optimized_for_agents' to VectorStoreWriter – ensure this flag is supported or handled in the node's initialization.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that a flag is supported or handled, which violates the rule against asking the author to ensure behavior is intended or tested. It doesn't provide a specific suggestion or ask for a specific test to be written.

Workflow ID: wflow_8ubyy9qdTjyMNOmg

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

examples/components/agents/agents/agent_vector_store_writer_tool.py

github-actions · 2025-10-28T16:45:32Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
dynamiq/nodes/tools
__init__.py	17	0	100%
preprocess_tool.py	35	12	65%	73–74, 76, 78–80, 82, 87, 93, 95–96, 98
dynamiq/nodes/writers
__init__.py	8	0	100%
writer.py	81	50	38%	85–98, 129–130, 136, 145–150, 160, 168–171, 187–190, 192, 194–196, 198, 201–203, 205, 211–213, 215–216, 218–221
TOTAL	18352	5922	67%

Tests	Skipped	Failures	Errors	Time
956	35 💤	0 ❌	0 🔥	3m 28s ⏱️

examples/components/agents/agents/agent_vector_store_writer_tool.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 7469f47 in 1 minute and 1 seconds. Click for details.

Reviewed 45 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. dynamiq/nodes/node.py:549

Draft comment:
New **kwargs parameter added to transform_output; update its docstring to document this change.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. dynamiq/nodes/node.py:877

Draft comment:
Config is now explicitly passed to transform_input and transform_output in run_sync; ensure all subclasses can handle the additional 'config' parameter.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is suggesting to ensure that all subclasses can handle the additional config parameter. This is a request for confirmation or verification, which violates the rule against asking the PR author to ensure behavior is intended or tested. Therefore, this comment should be removed.

3. pyproject.toml:7

Draft comment:
Version bump to 0.32.0 looks correct for this new feature.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions. It simply states that the version bump looks correct, which is not necessary for the PR author to know.

Workflow ID: wflow_sn5qH2TULM6Gk0o3

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

dynamiq/nodes/writers/writer.py

examples/components/agents/agents/agent_vector_store_writer_tool.py

olbychos

Comments about tool description and embedding params

ellipsis-dev

Caution

Changes requested ❌

Reviewed 079c539 in 2 minutes and 42 seconds. Click for details.

Reviewed 118 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. dynamiq/nodes/tools/preprocess_tool.py:95

Draft comment:
Validate that the output from splitter.run() contains the expected 'documents' key to prevent a potential KeyError.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment is asking for defensive programming - validating that a key exists before accessing it. However, this violates the rule about speculative comments. The code is calling a component from the same codebase (DocumentSplitterComponent), and there's no evidence that this component's contract is unreliable or that it might not return the expected structure. If DocumentSplitterComponent.run() is documented to return a dict with a "documents" key, then this validation would be unnecessary defensive programming. The comment is essentially saying "what if the component doesn't work as expected?" which is speculative. Without evidence that this is actually a problem (like seeing the component's implementation and finding it unreliable), this is just defensive programming suggestion without strong justification. However, I might be missing context about whether DocumentSplitterComponent is from an external library or has a history of unreliable behavior. Also, defensive programming for external API calls or unreliable components is sometimes warranted. If this component could legitimately fail to return the expected structure, the comment would be valid. Looking at the imports (line 6), DocumentSplitterComponent is from dynamiq.components.splitters.document, which appears to be an internal component of the same codebase. For internal components with defined contracts, we should trust the interface. If the component doesn't return the expected structure, that would be a bug in the component itself, not something to defensively code around. This is speculative without evidence of actual issues. This comment is speculative and suggests defensive programming without evidence that the component's contract is unreliable. It's asking to validate against a scenario that "might" happen rather than pointing to a definite issue. This violates the rule against speculative comments.

2. dynamiq/nodes/tools/preprocess_tool.py:78

Draft comment:
The 'or self.split_by' fallback is likely redundant since the Pydantic schema provides a default; consider removing it for clarity.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_wJ9OlLwMx4uwD7Js

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

dynamiq/nodes/tools/preprocess_tool.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 29ca7ae in 1 minute and 58 seconds. Click for details.

Reviewed 58 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. dynamiq/connections/managers.py:97

Draft comment:
The added check for the 'closed' attribute is useful. Consider also removing a stale closed client from the cache to avoid reusing it later.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is suggesting a code improvement, but let me trace through the logic. When a closed client is found (conn_client_closed is True), the code doesn't return early, so it falls through to lines 103-116 where a new client is created and stored in self.connection_clients[conn_id]. This overwrites the old closed client in the cache. So the stale client is already being removed/replaced. The comment might be suggesting an explicit deletion for clarity, but the current code already handles this correctly by overwriting. The comment is not pointing out a bug, just a potential code quality improvement that's debatable. The comment could be valid if there's a concern about memory or if the old client needs explicit cleanup before being replaced. Maybe the closed client has resources that need to be freed explicitly before overwriting the reference. I might be missing whether overwriting is sufficient or if explicit cleanup is needed. While explicit cleanup could be beneficial, the code already has a close() method (line 138-146) that handles closing all clients. The current implementation overwrites the stale client, which should be sufficient for Python's garbage collection. If explicit cleanup was needed, it would likely be a broader issue affecting the entire caching strategy, not just this specific case. The comment is speculative about a potential improvement rather than identifying a clear bug. The comment suggests a code quality improvement but doesn't identify a clear bug. The current code already handles stale clients by overwriting them. This is a speculative suggestion that falls under "nice to have" rather than "must fix." According to the rules, I should not keep speculative or purely informative comments.

2. dynamiq/storages/vector/pgvector/pgvector.py:122

Draft comment:
Updating the type annotation and client check (using 'client is None or client.closed') improves reliability. Ensure that the 'closed' property reliably indicates the connection state.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that a property reliably indicates a state, which falls under the category of asking the author to ensure behavior is intended. This violates the rules.

Workflow ID: wflow_bEZ8EeDJ4B2FhedJ

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed a5b38a2 in 1 minute and 0 seconds. Click for details.

Reviewed 126 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. dynamiq/callbacks/streaming.py:311

Draft comment:
Good use of a dict (_state_has_emitted) to track if a state has emitted. Consider documenting its purpose and if it should be reset between different streams.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. dynamiq/callbacks/streaming.py:441

Draft comment:
In _emit, the new block trims leading whitespace only for the first emission per state. Confirm this behavior is desired since subsequent segments skip trimming, preserving original whitespace.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. dynamiq/callbacks/streaming.py:469

Draft comment:
After streaming, _state_has_emitted[step] is set to true. Adding a clarifying comment here could help future maintainers understand that subsequent calls won’t trim whitespace.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. dynamiq/nodes/agents/agent.py:41

Draft comment:
Prompt instruction updates now emphasize first‐person and inline explanation. Ensure that any external integrations relying on prior formatting are updated accordingly.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. examples/components/agents/streaming/react/use_streaming.py:25

Draft comment:
The LLM model has been updated to "openai/gpt-5-chat-latest" and streaming mode set to ALL. Verify that this model identifier is supported and that the change in streaming mode produces the expected behavior.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_v0OxfWEh3NHERbHK

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

tyaroshko added 2 commits October 28, 2025 18:30

feat: add VectorStoreWriter tool

039b327

fix: formatting

e1086b4

tyaroshko requested a review from a team as a code owner October 28, 2025 16:37

chore: merge changes from main branch

dac6373

ellipsis-dev bot reviewed Oct 28, 2025

View reviewed changes

examples/components/agents/agents/agent_vector_store_writer_tool.py Show resolved Hide resolved

vitalii-dynamiq reviewed Oct 28, 2025

View reviewed changes

examples/components/agents/agents/agent_vector_store_writer_tool.py Show resolved Hide resolved

chore: merge changes from main branch

7469f47

ellipsis-dev bot reviewed Oct 29, 2025

View reviewed changes

tyaroshko requested review from TrachukT, acoola, delamainer, maksymbuleshnyi and olbychos October 30, 2025 08:47

olbychos reviewed Oct 30, 2025

View reviewed changes

dynamiq/nodes/writers/writer.py Show resolved Hide resolved

olbychos reviewed Oct 30, 2025

View reviewed changes

examples/components/agents/agents/agent_vector_store_writer_tool.py Show resolved Hide resolved

olbychos reviewed Oct 30, 2025

View reviewed changes

tyaroshko added 4 commits October 31, 2025 15:29

refactor: update tool description for VectorStoreWriter

fd96935

feat: add PreprocessTool

079c539

feat: add vector store write pipeline example

6ccc4de

chore: merge changes from main branch

29ca7ae

ellipsis-dev bot reviewed Oct 31, 2025

View reviewed changes

dynamiq/nodes/tools/preprocess_tool.py Show resolved Hide resolved

ellipsis-dev bot reviewed Oct 31, 2025

View reviewed changes

olbychos approved these changes Oct 31, 2025

View reviewed changes

tyaroshko added 2 commits October 31, 2025 16:08

chore: merge changes from main branch

d414a71

chore: merge changes from main branch

a5b38a2

ellipsis-dev bot reviewed Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add VectorStoreWriter tool #451

feat: add VectorStoreWriter tool #451

Uh oh!

tyaroshko commented Oct 28, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Uh oh!

olbychos left a comment

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add VectorStoreWriter tool #451

Are you sure you want to change the base?

feat: add VectorStoreWriter tool #451

Uh oh!

Conversation

tyaroshko commented Oct 28, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

olbychos left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tyaroshko commented Oct 28, 2025 •

edited by ellipsis-dev bot

Loading

github-actions bot commented Oct 28, 2025 •

edited

Loading