-
Notifications
You must be signed in to change notification settings - Fork 107
feat: add VectorStoreWriter tool #451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed dac6373 in 2 minutes and 30 seconds. Click for details.
- Reviewed
429lines of code in6files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. dynamiq/nodes/agents/agent.py:760
- Draft comment:
Nice integration of automatic ContextManagerTool injection via a model_validator. Consider adding more detailed logging (or raising a warning) when the tool cannot be added to aid in debugging edge cases where summarization is enabled but the tool isn’t present. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. dynamiq/nodes/tools/context_manager.py:140
- Draft comment:
The ContextManagerTool implementation is clean and its prompt template is well structured. Consider adding robustness against empty or failed LLM summarization (e.g. when the returned summary is empty) to improve error handling. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. examples/components/core/dag/agent_file_storage.yaml:36
- Draft comment:
Max_loops has been increased to 15 and summarization_config.enabled is set to true. Verify that the extended loop count and summarization do not adversely affect performance or lead to unintended context truncation. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
4. examples/use_cases/agents_use_cases/agent_deep_scraping.py:24
- Draft comment:
The deep scraping use-case now integrates multiple tools (ScaleSerpTool and E2BInterpreterTool) and properly configures file storage via FileStoreConfig. Ensure that the combined use of multiple tools and summarization produces the expected CSV output and that the file storage integration works as intended. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
5. examples/use_cases/agents_use_cases/agent_deep_scraping.py:21
- Draft comment:
Typographical error found: "usefull" should be "useful". - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While this is technically correct and in changed code, spelling errors in string literals that will only be seen by the LLM are extremely low impact. The LLM will understand the meaning regardless of the spelling. This seems like an overly pedantic comment that doesn't meaningfully improve the code quality. The spelling error could propagate to user-facing logs or documentation. Also, maintaining consistent spelling throughout the codebase is good practice. This is an internal prompt string that will only be processed by the LLM. The misspelling won't affect functionality or user experience in any way. Delete this comment as it's too minor and doesn't affect functionality. We should focus review comments on more impactful issues.
Workflow ID: wflow_Oxw2bo07NCUJStk2
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to e1086b4 in 3 minutes and 28 seconds. Click for details.
- Reviewed
281lines of code in3files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. dynamiq/nodes/writers/writer.py:90
- Draft comment:
Ensure that 'is_postponed_component_init' exists on document_embedder and document_writer to avoid attribute errors. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% Looking at the code, document_embedder and document_writer are typed as DocumentEmbedder and Writer respectively. Without seeing those base classes, I can't be 100% sure if is_postponed_component_init is guaranteed to exist. However, this seems like an internal API where these classes are expected to have this attribute. The suggestion is overly defensive programming. I don't have visibility into the DocumentEmbedder and Writer base classes - what if some implementations don't have this attribute? The suggestion could prevent runtime errors. The types are clearly defined and this appears to be an internal API. If the attribute was optional, it would likely be reflected in the type system. Over-defensive programming can make code less readable. The comment suggests an overly defensive programming pattern that isn't necessary given the typed nature of the code. The current implementation is cleaner and relies on proper interface implementation.
2. dynamiq/nodes/writers/writer.py:138
- Draft comment:
Merging kwargs with 'parent_run_id' from 'run_id' may yield None if 'run_id' is absent. Confirm this behavior is intended. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The code is handling the None case gracefully using dict.get(). Having parent_run_id be None when run_id is absent seems like a reasonable default behavior. The comment is asking for confirmation of intended behavior rather than pointing out an actual issue. This falls under the rule about not asking authors to confirm their intentions. Maybe there could be downstream code that requires parent_run_id to be non-None? Maybe this is part of a larger tracing system where None values cause problems? If parent_run_id was required to be non-None, the code would validate that or use a different default. The code is clearly written to handle the None case, and the comment is just speculating about intentions. Delete the comment. It's asking for confirmation of intentions rather than pointing out an actual issue, and the code handles the None case appropriately.
3. dynamiq/nodes/writers/writer.py:144
- Draft comment:
Overwriting _run_depends with a new dependency list may discard earlier steps. Verify if accumulating dependencies is needed. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The code shows a clear pattern of resetting dependencies at start and updating after each step. The dependencies appear to be used for tracing execution flow. Since each step depends only on the immediately previous step, overwriting makes sense - we don't need to accumulate all previous steps. The reset_run_state() method also confirms this is the intended behavior. I could be wrong about the tracing requirements - maybe there's a need to track the full chain of dependencies that I'm not seeing. The code structure strongly suggests sequential dependency tracking is intended - reset at start, update after each step. If full dependency chains were needed, the reset_run_state() method would likely not exist. The comment should be deleted. The current dependency tracking approach appears intentional and appropriate for sequential execution tracing.
4. examples/components/agents/agents/agent_vector_store_writer_tool.py:27
- Draft comment:
Passing 'is_optimized_for_agents' to VectorStoreWriter – ensure this flag is supported or handled in the node's initialization. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the author to ensure that a flag is supported or handled, which violates the rule against asking the author to ensure behavior is intended or tested. It doesn't provide a specific suggestion or ask for a specific test to be written.
Workflow ID: wflow_8ubyy9qdTjyMNOmg
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 7469f47 in 1 minute and 1 seconds. Click for details.
- Reviewed
45lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. dynamiq/nodes/node.py:549
- Draft comment:
New **kwargs parameter added to transform_output; update its docstring to document this change. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. dynamiq/nodes/node.py:877
- Draft comment:
Config is now explicitly passed to transform_input and transform_output in run_sync; ensure all subclasses can handle the additional 'config' parameter. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is suggesting to ensure that all subclasses can handle the additionalconfigparameter. This is a request for confirmation or verification, which violates the rule against asking the PR author to ensure behavior is intended or tested. Therefore, this comment should be removed.
3. pyproject.toml:7
- Draft comment:
Version bump to 0.32.0 looks correct for this new feature. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions. It simply states that the version bump looks correct, which is not necessary for the PR author to know.
Workflow ID: wflow_sn5qH2TULM6Gk0o3
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments about tool description and embedding params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed 079c539 in 2 minutes and 42 seconds. Click for details.
- Reviewed
118lines of code in2files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. dynamiq/nodes/tools/preprocess_tool.py:95
- Draft comment:
Validate that the output from splitter.run() contains the expected 'documents' key to prevent a potential KeyError. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment is asking for defensive programming - validating that a key exists before accessing it. However, this violates the rule about speculative comments. The code is calling a component from the same codebase (DocumentSplitterComponent), and there's no evidence that this component's contract is unreliable or that it might not return the expected structure. IfDocumentSplitterComponent.run()is documented to return a dict with a "documents" key, then this validation would be unnecessary defensive programming. The comment is essentially saying "what if the component doesn't work as expected?" which is speculative. Without evidence that this is actually a problem (like seeing the component's implementation and finding it unreliable), this is just defensive programming suggestion without strong justification. However, I might be missing context about whetherDocumentSplitterComponentis from an external library or has a history of unreliable behavior. Also, defensive programming for external API calls or unreliable components is sometimes warranted. If this component could legitimately fail to return the expected structure, the comment would be valid. Looking at the imports (line 6),DocumentSplitterComponentis fromdynamiq.components.splitters.document, which appears to be an internal component of the same codebase. For internal components with defined contracts, we should trust the interface. If the component doesn't return the expected structure, that would be a bug in the component itself, not something to defensively code around. This is speculative without evidence of actual issues. This comment is speculative and suggests defensive programming without evidence that the component's contract is unreliable. It's asking to validate against a scenario that "might" happen rather than pointing to a definite issue. This violates the rule against speculative comments.
2. dynamiq/nodes/tools/preprocess_tool.py:78
- Draft comment:
The 'or self.split_by' fallback is likely redundant since the Pydantic schema provides a default; consider removing it for clarity. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_wJ9OlLwMx4uwD7Js
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 29ca7ae in 1 minute and 58 seconds. Click for details.
- Reviewed
58lines of code in2files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. dynamiq/connections/managers.py:97
- Draft comment:
The added check for the 'closed' attribute is useful. Consider also removing a stale closed client from the cache to avoid reusing it later. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is suggesting a code improvement, but let me trace through the logic. When a closed client is found (conn_client_closed is True), the code doesn't return early, so it falls through to lines 103-116 where a new client is created and stored in self.connection_clients[conn_id]. This overwrites the old closed client in the cache. So the stale client is already being removed/replaced. The comment might be suggesting an explicit deletion for clarity, but the current code already handles this correctly by overwriting. The comment is not pointing out a bug, just a potential code quality improvement that's debatable. The comment could be valid if there's a concern about memory or if the old client needs explicit cleanup before being replaced. Maybe the closed client has resources that need to be freed explicitly before overwriting the reference. I might be missing whether overwriting is sufficient or if explicit cleanup is needed. While explicit cleanup could be beneficial, the code already has a close() method (line 138-146) that handles closing all clients. The current implementation overwrites the stale client, which should be sufficient for Python's garbage collection. If explicit cleanup was needed, it would likely be a broader issue affecting the entire caching strategy, not just this specific case. The comment is speculative about a potential improvement rather than identifying a clear bug. The comment suggests a code quality improvement but doesn't identify a clear bug. The current code already handles stale clients by overwriting them. This is a speculative suggestion that falls under "nice to have" rather than "must fix." According to the rules, I should not keep speculative or purely informative comments.
2. dynamiq/storages/vector/pgvector/pgvector.py:122
- Draft comment:
Updating the type annotation and client check (using 'client is None or client.closed') improves reliability. Ensure that the 'closed' property reliably indicates the connection state. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the author to ensure that a property reliably indicates a state, which falls under the category of asking the author to ensure behavior is intended. This violates the rules.
Workflow ID: wflow_bEZ8EeDJ4B2FhedJ
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed a5b38a2 in 1 minute and 0 seconds. Click for details.
- Reviewed
126lines of code in3files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. dynamiq/callbacks/streaming.py:311
- Draft comment:
Good use of a dict (_state_has_emitted) to track if a state has emitted. Consider documenting its purpose and if it should be reset between different streams. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. dynamiq/callbacks/streaming.py:441
- Draft comment:
In _emit, the new block trims leading whitespace only for the first emission per state. Confirm this behavior is desired since subsequent segments skip trimming, preserving original whitespace. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. dynamiq/callbacks/streaming.py:469
- Draft comment:
After streaming, _state_has_emitted[step] is set to true. Adding a clarifying comment here could help future maintainers understand that subsequent calls won’t trim whitespace. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. dynamiq/nodes/agents/agent.py:41
- Draft comment:
Prompt instruction updates now emphasize first‐person and inline explanation. Ensure that any external integrations relying on prior formatting are updated accordingly. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. examples/components/agents/streaming/react/use_streaming.py:25
- Draft comment:
The LLM model has been updated to "openai/gpt-5-chat-latest" and streaming mode set to ALL. Verify that this model identifier is supported and that the change in streaming mode produces the expected behavior. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_v0OxfWEh3NHERbHK
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Important
Add
VectorStoreWriterandPreprocessToolfor document processing and storage, with example workflows.VectorStoreWriterinwriter.pyfor writing documents to vector stores with metadata support.PreprocessToolinpreprocess_tool.pyfor splitting documents into smaller parts.agent_vector_store_write_pipeline.pyto demonstrate a workflow usingVectorStoreWriterandPreprocessTool.agent_vector_store_writer_tool.pyto demonstrate a workflow usingVectorStoreWriterfor travel data.This description was created by
for a5b38a2. You can customize this summary. It will automatically update as commits are pushed.