-
Notifications
You must be signed in to change notification settings - Fork 52
Streaming improvements #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant response_generator
participant stream_build_event
Client->>response_generator: Request streaming response
loop For each chunk received
response_generator->>stream_build_event: Process chunk
loop For each event yielded by stream_build_event
stream_build_event-->>response_generator: Yield SSE event string
response_generator-->>Client: Send SSE event
end
end
Estimated code review effort3 (~45 minutes) Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧬 Code Graph Analysis (1)tests/unit/app/endpoints/test_streaming_query.py (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (12)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
65d4cba to
627265f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/app/endpoints/streaming_query.py (2)
189-218: Consider improving violation message formatting.The violation message concatenates metadata and user_message with just a space separator, which might make parsing difficult.
Consider using a structured format:
- "token": f"{violation.metadata} {violation.user_message}", + "token": f"Violation: {violation.user_message} (Metadata: {violation.metadata})",
295-295: Consider making the hard-coded role configurable.The role is hard-coded as "inference" in multiple places to satisfy PatternFly UI requirements. This creates tight coupling with the UI implementation.
Consider making this configurable:
# At module level or in configuration PATTERNFLY_TOOL_EXECUTION_ROLE = "inference" # In the function "role": PATTERNFLY_TOOL_EXECUTION_ROLE, # chunk.event.payload.step_type,Also applies to: 309-309, 324-324, 348-348, 361-361
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/endpoints/streaming_query.py(4 hunks)tests/unit/app/endpoints/test_streaming_query.py(8 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/unit/app/endpoints/test_streaming_query.py (1)
src/app/endpoints/streaming_query.py (1)
stream_build_event(130-392)
🔇 Additional comments (13)
src/app/endpoints/streaming_query.py (7)
6-6: LGTM! Import changes align with the new functionality.The addition of
Iteratorto the type imports is appropriate for the refactored generator function.
15-18: LGTM! New imports support the enhanced event handling.The added imports for
interleaved_content_as_str,ToolCall, andTextContentItemare necessary for the detailed event processing in the refactored function.
147-161: LGTM! Error handling is properly implemented.The error event handling correctly checks for the error attribute and yields a properly formatted SSE error event.
162-188: LGTM! Turn event handling prevents ASGI timeouts.The implementation correctly sends empty token events for turn_start and turn_awaiting_input, which helps prevent ASGI timeouts during periods of inactivity. The turn_complete event properly includes the full output message content.
219-283: LGTM! Comprehensive inference event handling.The inference handling correctly processes all event types and properly distinguishes between text and tool call deltas, including handling both string and ToolCall object formats.
381-393: LGTM! Heartbeat events prevent timeout for unhandled cases.The heartbeat event serves as an effective catch-all mechanism to keep the connection alive for unrecognized event types, which aligns with the PR's goal of preventing ASGI timeouts.
432-437: LGTM! Correctly handles multiple events per chunk.The updated logic properly iterates over all events yielded by the refactored
stream_build_eventfunction and accumulates tokens from each event.tests/unit/app/endpoints/test_streaming_query.py (6)
15-27: LGTM! Tests now use concrete types instead of mocks.The transition from generic mocks to concrete
llama_stack_client.typesinstances improves test reliability and ensures the tests accurately reflect real-world usage.
164-208: LGTM! Well-structured test data with clear documentation.The test data properly constructs the streaming response using concrete types, and the comment explaining why mocks can't be used (due to
hasattrbehavior) is helpful for future maintainers.
255-256: LGTM! Test assertions correctly updated for new event structure.The assertions properly reflect:
- Increased number of streaming chunks (4 → 6) due to multiple events per chunk
- Expanded response content including tool execution details
Also applies to: 269-270
571-595: LGTM! Comprehensive test for step_progress events.The test properly verifies the token event generation for text deltas using concrete types and appropriate assertions.
597-647: LGTM! Thorough test for step_complete with multiple events.The test correctly verifies that multiple events are yielded for tool execution, including tool calls and summaries. The use of an iterator and multiple
next()calls properly tests the generator behavior.
649-669: LGTM! Important test for heartbeat fallback mechanism.This test ensures that unrecognized event types generate heartbeat events rather than causing errors, which is crucial for maintaining connection stability.
87640f1 to
24a8048
Compare
umago
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM! Really good refactor, thanks!
| elif step_type == "tool_execution": | ||
| yield from _handle_tool_execution_event(chunk, chunk_id, metadata_map) | ||
| else: | ||
| yield from _handle_heartbeat_event(chunk_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this! So much easier to follow the flow
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please rebase
24a8048 to
476d12d
Compare
Description
I've been testing with
/streaming_queryand a Pattern Fly "Chat Bot" UI enabled client.I found that the event stream from
lightspeed-stackwas causing problems.It looks like streaming is terminated/closed and re-started.
This manifested itself as inter-weaved responses, for example "Ansible is blah blah blah[
asgi-timed-out]Ansible is blah blah blah". I suspect this was caused bylightspeed-stackonly sending some selectivellama-stackstream tokens back to the calling client. If there was a big delay, e.g. while the RAG/embedding database is loaded byllama-stack,lightspeed-stackwouldn't send any events for a while and ASGI killed the async process assuming it had died.With the fix in this PR, to send more tokens, the issue seems to be resolved.
This leads to funny things in the UI:
"Tool calls" were not being rendered correctly.
This PR fixes it:
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Tests