-
Notifications
You must be signed in to change notification settings - Fork 51
Base implementation of streaming Responses API #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughRefactors streaming query endpoints to support Agent API (v1) and Responses API (v2) via a shared base handler, factory-based generators, centralized tool preparation, a ResponseGeneratorContext dataclass, post-streaming cleanup utility, and registers a v2 streaming router under the /v2 prefix. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Base as streaming_query_endpoint_handler_base
participant Retrieval as retrieve_response_func
participant Generator as Response Generator Factory
participant StreamAPI as Response Stream (Agent/Responses)
participant Cleanup as cleanup_after_streaming
participant Persistence as Cache/DB
Client->>Base: HTTP request (builds ResponseGeneratorContext)
Base->>Retrieval: call retrieve_response_func(context)
Retrieval->>StreamAPI: start streaming (async iterator)
StreamAPI-->>Generator: stream chunks (init / tokens / tools / complete)
Generator->>Base: yield SSE events (start, token, tool_call, turn_complete, end)
Base->>Client: stream SSE events
Base->>Cleanup: invoke cleanup_after_streaming(context, summary, rag_chunks)
Cleanup->>Persistence: persist cache, transcripts, topic summary
Persistence-->>Cleanup: ack
sequenceDiagram
participant V1 as streaming_query_endpoint_handler (Agent API)
participant V2 as streaming_query_endpoint_handler_v2 (Responses API)
participant Base as streaming_query_endpoint_handler_base
participant Context as ResponseGeneratorContext
Note over V1,V2: V1 and V2 construct Context and delegate to shared Base
V1->>Context: populate Agent-specific fields
V1->>Base: provide Agent retrieve/generator funcs
V2->>Context: populate Responses-specific fields
V2->>Base: provide Responses retrieve/generator funcs
Base-->>V1: StreamingResponse
Base-->>V2: StreamingResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)src/app/endpoints/streaming_query.py(5 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/app/routers.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)tests/unit/app/test_routers.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/app/test_routers.pysrc/app/routers.pytests/unit/app/endpoints/test_streaming_query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pytests/unit/app/endpoints/test_query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_query_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/routers.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/routers.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/routers.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (7)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(37-52)
tests/unit/app/endpoints/test_streaming_query_v2.py (5)
src/models/requests.py (1)
QueryRequest(73-225)src/models/config.py (3)
config(140-146)Action(329-375)ModelContextProtocolServer(169-174)src/app/endpoints/streaming_query.py (1)
retrieve_response(1024-1145)src/app/endpoints/streaming_query_v2.py (2)
retrieve_response(409-491)streaming_query_endpoint_handler_v2(379-406)src/configuration.py (1)
mcp_servers(101-105)
src/app/endpoints/streaming_query.py (7)
src/models/requests.py (1)
QueryRequest(73-225)src/app/endpoints/streaming_query_v2.py (2)
response_generator(138-372)retrieve_response(409-491)src/utils/types.py (2)
TurnSummary(89-163)append_tool_calls_from_llama(96-117)src/utils/endpoints.py (4)
get_system_prompt(126-190)create_rag_chunks_dict(383-396)create_referenced_documents_with_metadata(563-577)store_conversation_into_cache(231-251)src/app/endpoints/query.py (4)
is_transcripts_enabled(98-104)get_topic_summary(184-214)persist_user_conversation_details(107-139)retrieve_response(639-798)src/utils/transcripts.py (1)
store_transcript(40-99)src/app/database.py (1)
get_session(34-40)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (3)
extract_token_usage_from_responses_api(339-430)get_mcp_tools(456-472)get_rag_tools(433-453)src/app/endpoints/streaming_query.py (6)
format_stream_data(125-136)stream_end_event(163-219)stream_start_event(139-160)streaming_query_endpoint_handler_base(860-989)response_generator(733-855)retrieve_response(1024-1145)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/database/conversations.py (1)
UserConversation(11-38)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)
src/app/endpoints/query_v2.py (10)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/authentication/__init__.py (1)
get_auth_dependency(14-52)src/authorization/middleware.py (1)
authorize(111-122)src/configuration.py (2)
configuration(73-77)mcp_servers(101-105)src/models/requests.py (1)
QueryRequest(73-225)src/models/responses.py (4)
ForbiddenResponse(1120-1142)QueryResponse(194-305)ReferencedDocument(179-191)UnauthorizedResponse(1094-1117)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(433-453)get_mcp_tools(456-472)retrieve_response(144-315)query_endpoint_handler_v2(119-141)src/configuration.py (2)
mcp_servers(101-105)llama_stack_configuration(87-91)
src/app/endpoints/query.py (2)
src/app/endpoints/streaming_query.py (1)
retrieve_response(1024-1145)src/app/endpoints/query_v2.py (2)
retrieve_response(144-315)get_topic_summary(70-114)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (1)
src/app/endpoints/streaming_query.py (1)
699-857: Specify the concrete return type instead ofAny.Per our Python guidelines we need full type annotations, but
create_agent_response_generator(...) -> Anyhides the callable shape and weakens static checks. Please annotate it precisely (e.g.Callable[[AsyncIterator[AgentTurnResponseStreamChunk]], AsyncIterator[str]]) so downstream call sites stay type-safe.-def create_agent_response_generator( # pylint: disable=too-many-arguments,too-many-locals +def create_agent_response_generator( # pylint: disable=too-many-arguments,too-many-locals conversation_id: str, user_id: str, model_id: str, provider_id: str, query_request: QueryRequest, metadata_map: dict[str, dict[str, Any]], client: AsyncLlamaStackClient, llama_stack_model_id: str, started_at: str, _skip_userid_check: bool, -) -> Any: +) -> Callable[ + [AsyncIterator[AgentTurnResponseStreamChunk]], + AsyncIterator[str], +]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/app/endpoints/streaming_query_v2.py (2)
175-189: Stop persisting the conversation inside the "response.created" handler.Persisting here means the later call to
persist_user_conversation_details()at lines 365-371 incrementsmessage_counta second time, and it also makesexisting_conversationnon-null in the topic-summary block (lines 329-336), soget_topic_summary()never runs for brand-new conversations.Apply this diff to drop the premature persist:
if event_type == "response.created": try: conv_id = getattr(chunk, "response").id except Exception: # pylint: disable=broad-except conv_id = "" yield stream_start_event(conv_id) - if conv_id: - persist_user_conversation_details( - user_id=user_id, - conversation_id=conv_id, - model=model_id, - provider_id=provider_id, - topic_summary=None, - ) continue
454-462: Streamed requests must include attachment content.Unlike the non-streaming path in query_v2.py (lines 197-206), this streaming retrieve skips the attachment text entirely, so any uploaded logs/configs never reach the LLM. That's a functional regression for users relying on attachments.
Apply this diff to mirror the non-streaming input assembly:
+ # Prepare input for Responses API + # Convert attachments to text and concatenate with query + input_text = query_request.query + if query_request.attachments: + for attachment in query_request.attachments: + input_text += ( + f"\n\n[Attachment: {attachment.attachment_type}]\n" + f"{attachment.content}" + ) + response = await client.responses.create( - input=query_request.query, + input=input_text, model=model_id, instructions=system_prompt, previous_response_id=query_request.conversation_id, tools=(cast(Any, toolgroups)), stream=True, store=True, )
🧹 Nitpick comments (4)
src/app/endpoints/streaming_query_v2.py (2)
33-33: Use absolute import path for internal authentication module.The coding guidelines specify that internal modules in
src/**/*.pyshould use absolute imports (e.g.,from auth import get_auth_dependency), but this uses a relative-style import.As per coding guidelines
Apply this diff to align with the import convention:
-from authentication import get_auth_dependency +from auth import get_auth_dependency
53-53: Use__name__for the logger instead of a hardcoded string.Coding guidelines require
logger = logging.getLogger(__name__)for module logging, which provides better traceability.As per coding guidelines
Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)src/app/endpoints/query_v2.py (2)
17-17: Use absolute import path for internal authentication module.The coding guidelines specify that internal modules in
src/**/*.pyshould use absolute imports (e.g.,from auth import get_auth_dependency).As per coding guidelines
Apply this diff:
-from authentication import get_auth_dependency +from auth import get_auth_dependency
38-38: Use__name__for the logger instead of a hardcoded string.Coding guidelines require
logger = logging.getLogger(__name__)for better traceability.As per coding guidelines
Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/endpoints/query_v2.py(1 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
🧬 Code graph analysis (2)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(317-408)get_topic_summary(70-114)prepare_tools_for_responses_api(453-502)retrieve_response(144-293)src/app/endpoints/streaming_query.py (6)
format_stream_data(125-136)stream_end_event(163-219)stream_start_event(139-160)streaming_query_endpoint_handler_base(860-989)response_generator(733-855)retrieve_response(1024-1145)src/models/requests.py (1)
QueryRequest(73-225)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)
src/app/endpoints/query_v2.py (7)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/models/responses.py (2)
QueryResponse(194-305)ReferencedDocument(179-191)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(408-468)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (6)
src/app/endpoints/streaming_query_v2.py (1)
376-405: LGTM!The endpoint handler correctly delegates to the base handler with the v2-specific retrieve and generator functions. The structure and type annotations are appropriate.
src/app/endpoints/query_v2.py (5)
70-114: LGTM!The topic summary generation correctly uses the Responses API with proper error handling and text extraction logic.
117-141: LGTM!The endpoint handler correctly delegates to the base handler with appropriate v2-specific functions.
144-293: LGTM!The retrieve_response function correctly assembles input with attachments, processes the response output, and extracts tool calls. The implementation is solid.
296-314: Placeholder is intentional per PR objectives.This function currently returns an empty list with a TODO, which aligns with the PR description stating that "support for parsing and returning referenced documents in responses" will be addressed in follow-up PRs.
317-408: LGTM!The token usage extraction is comprehensive with proper error handling for various response formats and graceful degradation when usage data is unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/unit/app/endpoints/test_streaming_query_v2.py (1)
189-189: Use the shared MOCK_AUTH constant for test authentication.The coding guidelines specify using
MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")for auth tuples in tests rather than inline values.As per coding guidelines.
Apply this pattern:
+# At module level +MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") # In test_streaming_query_endpoint_handler_v2_success_yields_events: - auth=("user123", "", True, "token-abc"), # skip_userid_check=True + auth=("user123", "", True, "token-abc"), # Can keep custom values if testing specific scenarios # In test_streaming_query_endpoint_handler_v2_api_connection_error: - auth=("user123", "", False, "tok"), + auth=MOCK_AUTH,Also applies to: 244-244
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query_v2.py(1 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_streaming_query_v2.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query_v2.py (10)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(317-408)get_topic_summary(70-114)prepare_tools_for_responses_api(453-511)retrieve_response(144-293)src/app/endpoints/streaming_query.py (6)
format_stream_data(125-136)stream_end_event(163-219)stream_start_event(139-160)streaming_query_endpoint_handler_base(860-989)response_generator(733-855)retrieve_response(1024-1145)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/database/conversations.py (1)
UserConversation(11-38)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
tests/unit/app/endpoints/test_streaming_query_v2.py (4)
src/models/requests.py (1)
QueryRequest(73-225)src/models/config.py (3)
config(140-146)Action(329-375)ModelContextProtocolServer(169-174)src/app/endpoints/streaming_query_v2.py (2)
retrieve_response(400-470)streaming_query_endpoint_handler_v2(370-397)src/configuration.py (1)
mcp_servers(101-105)
src/app/endpoints/query_v2.py (6)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(400-470)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (10)
src/app/endpoints/streaming_query_v2.py (3)
103-365: Well-structured streaming response generator implementation.The
create_responses_response_generatorfactory pattern cleanly separates concerns and correctly handles:
- In-band conversation ID extraction from
response.createdevents- Token and tool call streaming with proper SSE formatting
- Single persistence call after topic summary logic (avoiding double-increment)
- Attachment content inclusion in input assembly
- Token usage extraction and metrics updates
The previous review concerns about premature persistence and missing attachment content have been properly addressed.
368-397: Clean delegation to base handler.The wrapper pattern provides good separation between v2-specific Responses API logic and shared streaming infrastructure.
400-470: Proper tool preparation and attachment handling for streaming.The
retrieve_responsefunction correctly:
- Validates attachments before processing
- Prepares RAG and MCP tools with header merging via
prepare_tools_for_responses_api- Concatenates attachment content into the input text
- Returns an empty conversation ID for in-band extraction
src/app/endpoints/query_v2.py (7)
70-114: Topic summary generation adapted correctly for Responses API.The function properly uses
client.responses.create()withstore=Falseto avoid polluting conversation history, and robustly extracts text from various content structures.
117-141: Clean delegation to shared base handler.The v2 query endpoint correctly injects Responses API-specific functions into the base handler pattern.
144-293: Solid non-streaming retrieve implementation.The function correctly:
- Validates attachments upfront
- Prepares tools with MCP header merging
- Assembles input text with attachment content
- Parses response output and tool calls from OpenAI format
- Extracts token usage and referenced documents
296-314: Placeholder appropriately documents missing functionality.The TODO clearly indicates that referenced document parsing from Responses API response structures (file_search results, citations) is deferred to follow-up work, as mentioned in the PR objectives.
317-408: Robust token usage extraction with defensive handling.The implementation correctly:
- Handles both dict and object response.usage formats
- Gracefully degrades when usage data is absent (expected when llama stack uses chat_completions internally)
- Updates Prometheus metrics with proper error handling
- Increments call counter even when token counts are unavailable
411-450: Clean tool definition helpers.The
get_rag_toolsandget_mcp_toolsfunctions provide clear, focused transformations from configuration to Responses API tool format.
453-511: Tool preparation correctly merges per-request MCP headers.The previous review concern about dropping
mcp_headershas been properly addressed. Lines 494-500 merge the caller's per-request headers into each MCP tool definition, ensuring short-lived tokens and dynamic auth reach the MCP servers.
dd1cee0 to
68dc335
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)
860-990: Fix docstring mood and argument count to unblock lint.The new base handler’s docstring starts with “Base handler…”, violating pydocstyle D401, and the signature has six positional parameters, exceeding pylint’s five-argument limit. Please rephrase the docstring to an imperative sentence (e.g., “Handle streaming query requests…”) and collapse the parameters (e.g., wrap injected callbacks/mcp headers into a struct) so pylint R0917 passes.
🧹 Nitpick comments (3)
tests/unit/app/endpoints/test_query_v2.py (1)
385-434: Use the sharedMOCK_AUTHconstant.Per the repo’s test guidelines, test auth tuples should reuse
MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")instead of crafting custom values. Please switch theauth=argument to that constant (and update related expectations) so the suite follows the standard.src/app/endpoints/streaming_query_v2.py (2)
53-53: Logger name should use__name__for consistency.The logger is initialized with a hardcoded string
"app.endpoints.handlers"instead of__name__. Using__name__ensures the logger reflects the actual module path and improves maintainability.Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)
103-366: Consider refactoring to reduce complexity.The function has 10 positional parameters (limit: 5), and the inner
response_generatorhas 17 branches (limit: 12) and 75 statements (limit: 50). While the logic is correct, this complexity makes the code harder to maintain and test.Consider:
- Grouping related parameters into a configuration object or dataclass
- Extracting event-handling logic into separate helper functions (e.g.,
_handle_text_delta,_handle_tool_call,_handle_completion)- Moving post-processing logic (transcript, cache, persistence) into a separate function
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)src/app/endpoints/streaming_query.py(5 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/app/routers.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)tests/unit/app/test_routers.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/unit/app/endpoints/test_streaming_query_v2.py
- tests/unit/app/test_routers.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/routers.pysrc/app/endpoints/query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/routers.pysrc/app/endpoints/query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pytests/unit/app/endpoints/test_query_v2.pysrc/app/routers.pysrc/app/endpoints/query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/routers.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (6)
src/app/endpoints/query_v2.py (6)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(400-470)
src/app/endpoints/streaming_query.py (8)
src/models/requests.py (1)
QueryRequest(73-225)src/app/endpoints/streaming_query_v2.py (2)
response_generator(137-363)retrieve_response(400-470)src/utils/types.py (2)
TurnSummary(89-163)append_tool_calls_from_llama(96-117)src/utils/endpoints.py (4)
get_system_prompt(126-190)create_rag_chunks_dict(383-396)create_referenced_documents_with_metadata(563-577)store_conversation_into_cache(231-251)src/utils/token_counter.py (2)
extract_token_usage_from_turn(44-94)TokenCounter(18-41)src/app/endpoints/query.py (4)
is_transcripts_enabled(98-104)get_topic_summary(184-214)persist_user_conversation_details(107-139)retrieve_response(639-798)src/utils/transcripts.py (1)
store_transcript(40-99)src/app/endpoints/query_v2.py (2)
get_topic_summary(235-274)retrieve_response(304-427)
src/app/endpoints/streaming_query_v2.py (16)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(451-541)get_topic_summary(235-274)prepare_tools_for_responses_api(617-668)retrieve_response(304-427)src/app/endpoints/streaming_query.py (6)
format_stream_data(125-136)stream_end_event(163-219)stream_start_event(139-160)streaming_query_endpoint_handler_base(860-989)response_generator(733-855)retrieve_response(1024-1145)src/authentication/__init__.py (1)
get_auth_dependency(14-52)src/authorization/middleware.py (1)
authorize(111-122)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/config.py (2)
config(140-146)Action(329-375)src/models/database/conversations.py (1)
UserConversation(11-38)src/models/requests.py (1)
QueryRequest(73-225)src/models/responses.py (2)
ForbiddenResponse(1120-1142)UnauthorizedResponse(1094-1117)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(552-572)get_mcp_tools(575-614)retrieve_response(304-427)query_endpoint_handler_v2(279-301)src/configuration.py (2)
mcp_servers(101-105)llama_stack_configuration(87-91)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(37-52)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/app/endpoints/query_v2.py (2)
retrieve_response(304-427)get_topic_summary(235-274)
🪛 GitHub Actions: Pydocstyle
src/app/endpoints/streaming_query.py
[error] 868-868: pydocstyle D401: First line should be in imperative mood; try rephrasing (found 'Base') in public function 'streaming_query_endpoint_handler_base'.
🪛 GitHub Actions: Pyright
src/app/endpoints/streaming_query_v2.py
[error] 456-456: pyright: No overloads for 'create' match the provided arguments (reportCallIssue) in streaming_query_v2.py:456:22. Step: 'uv run pyright src'.
[error] 460-460: pyright: Argument of type 'str | None' cannot be assigned to parameter 'previous_response_id' of type 'str | NotGiven' in function 'create' in streaming_query_v2.py:460:30. Step: 'uv run pyright src'.
🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py
[error] 699-699: pylint: R0917 Too many positional arguments (9/5) (too-many-positional-arguments)
[error] 860-860: pylint: R0917 Too many positional arguments (6/5) (too-many-positional-arguments)
src/app/endpoints/streaming_query_v2.py
[error] 103-103: pylint: R0917 Too many positional arguments (9/5) (too-many-positional-arguments)
[error] 137-137: pylint: R0912 Too many branches (17/12) (too-many-branches)
[error] 137-137: pylint: R0915 Too many statements (75/50) (too-many-statements)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (1)
src/app/endpoints/streaming_query_v2.py (1)
368-397: LGTM!The endpoint handler correctly follows the FastAPI pattern with proper authorization, dependency injection, and clean delegation to the base handler. The wrapper pattern enables code reuse while keeping v2-specific concerns isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/app/endpoints/streaming_query.py (1)
700-702: Specify the return type more precisely.The return type
Anyis overly broad. Consider using a more specific type annotation for better type safety:from collections.abc import Callable from typing import AsyncIterator def create_agent_response_generator( context: ResponseGeneratorContext, ) -> Callable[[AsyncIterator[AgentTurnResponseStreamChunk]], AsyncIterator[str]]:This clearly documents that the factory returns a callable taking an async iterator of chunks and yielding SSE strings.
src/app/endpoints/streaming_query_v2.py (1)
104-118: Specify the return type more precisely.Like the v1 factory, the return type
Anyshould be more specific for better type safety:def create_responses_response_generator( context: ResponseGeneratorContext, ) -> Callable[[AsyncIterator[OpenAIResponseObjectStream]], AsyncIterator[str]]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/streaming_query.py(6 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/models/context.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.pysrc/models/context.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.pysrc/models/context.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/context.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/context.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query_v2.py (12)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(451-541)get_topic_summary(235-274)prepare_tools_for_responses_api(617-668)retrieve_response(304-427)src/app/endpoints/streaming_query.py (6)
format_stream_data(126-137)stream_end_event(164-220)stream_start_event(140-161)streaming_query_endpoint_handler_base(851-983)response_generator(716-846)retrieve_response(1018-1139)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/database/conversations.py (1)
UserConversation(11-38)src/models/requests.py (1)
QueryRequest(73-225)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
src/app/endpoints/streaming_query.py (6)
src/models/context.py (1)
ResponseGeneratorContext(12-48)src/app/endpoints/streaming_query_v2.py (1)
response_generator(120-348)src/utils/types.py (2)
TurnSummary(89-163)append_tool_calls_from_llama(96-117)src/utils/endpoints.py (4)
get_system_prompt(126-190)create_rag_chunks_dict(383-396)create_referenced_documents_with_metadata(563-577)store_conversation_into_cache(231-251)src/utils/token_counter.py (2)
extract_token_usage_from_turn(44-94)TokenCounter(18-41)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)get_topic_summary(184-214)persist_user_conversation_details(107-139)
src/models/context.py (1)
src/models/requests.py (1)
QueryRequest(73-225)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: shellcheck
- GitHub Check: build-pr
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (7)
src/models/context.py (1)
1-48: LGTM! Well-structured context dataclass.The
ResponseGeneratorContextdataclass effectively reduces function parameter counts from 9+ to 1, improving maintainability. The implementation follows all coding guidelines with proper module/class docstrings, complete type annotations, and logical field grouping.src/app/endpoints/streaming_query.py (2)
851-983: LGTM! Clean dependency injection pattern.The base handler effectively extracts common streaming logic while accepting API-specific functions via
retrieve_response_funcandcreate_response_generator_func. TheResponseGeneratorContextis properly constructed and passed to the generator factory, enabling v1/v2 code reuse.
986-1015: LGTM! Proper v1 endpoint wrapper.The wrapper cleanly delegates to the base handler while providing Agent API-specific implementations (
retrieve_responseandcreate_agent_response_generator). This maintains backward compatibility for the v1/streaming_queryendpoint.src/app/endpoints/streaming_query_v2.py (4)
1-56: LGTM! Proper module setup.Module follows coding guidelines with descriptive docstring, proper logger initialization, and absolute imports.
273-349: LGTM! Post-streaming logic correctly implemented.The finalization sequence properly:
- Extracts token usage from the Responses API response object
- Yields end event with metadata
- Stores transcripts conditionally
- Generates topic summaries only for new conversations
- Persists cache and conversation details
The TODO comments at lines 296-297 about RAG chunks align with the PR description noting that referenced documents support will come in a follow-up.
455-457: LGTM! Proper handling of delayed conversation ID.Returning an empty string for
conversation_idis correct for the Responses API streaming flow, where the ID arrives in the firstresponse.createdchunk (lines 158-163). The comment clearly documents this behavior.
441-452: ****The concern about passing
Nonefortoolsis unfounded. Test assertions in the codebase (explicitly verify thattools=Noneis passed to the API and is accepted). The current code—unconditionally including"tools": toolgroups—is correct and tested. No conditional logic is needed.Likely an incorrect or invalid review comment.
5c289ed to
dc74af7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)
617-668: Consider adding error handling for vector store retrieval.The function is well-structured and follows coding guidelines. However, the call to
client.vector_stores.list()at line 647 could potentially fail. Consider wrapping it in a try-except block to handle transient errors gracefully, falling back to no RAG tools if the vector store service is unavailable.Example:
toolgroups = [] # Get vector stores for RAG tools - vector_store_ids = [ - vector_store.id for vector_store in (await client.vector_stores.list()).data - ] + try: + vector_store_ids = [ + vector_store.id for vector_store in (await client.vector_stores.list()).data + ] + except Exception as e: + logger.warning("Failed to retrieve vector stores: %s", e) + vector_store_ids = []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/app/endpoints/query_v2.py(3 hunks)src/app/endpoints/streaming_query.py(6 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/app/routers.py(2 hunks)src/models/context.py(1 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)tests/unit/app/test_routers.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/app/routers.py
🧰 Additional context used
📓 Path-based instructions (9)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/models/context.pysrc/app/endpoints/streaming_query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/models/context.pysrc/app/endpoints/streaming_query.pytests/unit/app/test_routers.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/context.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/context.py
🧬 Code graph analysis (6)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(451-541)get_topic_summary(235-274)prepare_tools_for_responses_api(617-668)retrieve_response(304-427)src/app/endpoints/streaming_query.py (6)
format_stream_data(126-137)stream_end_event(164-220)stream_start_event(140-161)streaming_query_endpoint_handler_base(851-983)response_generator(716-846)retrieve_response(1018-1139)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/context.py (1)
ResponseGeneratorContext(12-48)src/models/database/conversations.py (1)
UserConversation(11-38)src/utils/endpoints.py (2)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)src/utils/transcripts.py (1)
store_transcript(40-99)
tests/unit/app/endpoints/test_query_v2.py (2)
src/models/config.py (1)
ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (1)
get_mcp_tools(575-614)
tests/unit/app/endpoints/test_streaming_query_v2.py (3)
src/models/requests.py (1)
QueryRequest(73-225)src/models/config.py (3)
config(140-146)Action(329-375)ModelContextProtocolServer(169-174)src/configuration.py (1)
mcp_servers(101-105)
src/app/endpoints/query_v2.py (2)
src/configuration.py (3)
configuration(73-77)AppConfig(39-181)mcp_servers(101-105)src/models/requests.py (1)
QueryRequest(73-225)
src/models/context.py (1)
src/models/requests.py (1)
QueryRequest(73-225)
src/app/endpoints/streaming_query.py (2)
src/models/context.py (1)
ResponseGeneratorContext(12-48)src/app/endpoints/streaming_query_v2.py (2)
response_generator(120-348)retrieve_response(385-457)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (18)
tests/unit/app/test_routers.py (1)
23-23: LGTM! Router registration updates are correct.The test updates correctly account for the new v2 streaming router: import added, router count incremented to 17, existence assertion added, and prefix verification included.
Also applies to: 69-69, 79-79, 95-95, 105-105
src/app/endpoints/query_v2.py (3)
21-21: LGTM! Import required for new function.The
AppConfigimport is needed for theprepare_tools_for_responses_apifunction signature.
353-355: LGTM! Good refactoring for code reuse.Extracting tool preparation into
prepare_tools_for_responses_apicentralizes the logic for both query and streaming v2 endpoints, and correctly passesmcp_headersto enable per-server authentication.
601-614: LGTM! Header merging logic is correct and flexible.The header construction properly merges token-based authentication with per-server custom headers, allowing server-specific authorization overrides when needed. This addresses the previous review concern about dropped MCP headers.
tests/unit/app/endpoints/test_query_v2.py (1)
60-96: LGTM! Comprehensive test coverage for header merging.The test thoroughly validates the three key scenarios: per-server headers without token, merged token + per-server headers, and Authorization header override. This ensures the header merging logic in
get_mcp_toolsworks correctly.src/models/context.py (1)
1-48: LGTM! Well-designed context object.The
ResponseGeneratorContextdataclass is well-structured with complete docstrings and type hints following coding guidelines. It effectively addresses the previous concern about too many positional arguments by bundling related parameters logically. The grouping of fields (conversation/user context, model/provider info, request/timing, dependencies/state) makes the code more maintainable.tests/unit/app/endpoints/test_streaming_query_v2.py (4)
20-26: LGTM! Well-structured test fixture.The
dummy_requestfixture properly sets up a mock FastAPI Request with authorized actions, which is necessary for RBAC checks in the endpoint handler.
29-56: LGTM! Tests validate core tool preparation logic.Both tests properly verify that
retrieve_responsebuilds the correct tools for the Responses API:test_retrieve_response_builds_rag_and_mcp_toolsconfirms RAG and MCP tools are assembled, andtest_retrieve_response_no_tools_passes_noneensures the no_tools flag is respected.Also applies to: 59-80
83-223: LGTM! Comprehensive integration test for streaming flow.This test validates the complete streaming lifecycle including SSE event generation, conversation persistence, and metric updates. The fake stream covers all major event types (response.created, output_text.delta, function_call, etc.), and assertions verify both event content and ordering.
226-250: LGTM! Error handling test is appropriate.The test correctly verifies that API connection errors are caught, logged, return HTTP 500, and increment the failure metric.
src/app/endpoints/streaming_query.py (4)
8-8: LGTM! Necessary imports for refactoring.The
Callableimport supports the factory function parameters in the base handler, andResponseGeneratorContextenables the parameter reduction pattern.Also applies to: 50-50
700-848: LGTM! Factory function properly encapsulates Agent API streaming logic.The
create_agent_response_generatorfunction effectively bundles the streaming response generation logic, using theResponseGeneratorContextto access all necessary parameters. The inner generator handles SSE formatting, transcript storage, topic summarization, caching, and persistence correctly.
851-983: LGTM! Base handler provides good separation of concerns.The
streaming_query_endpoint_handler_baseextracts all common streaming logic (RBAC, conversation ownership validation, model selection, error handling) while accepting injectable functions for API-specific behavior. This enables code reuse between v1 (Agent API) and v2 (Responses API) endpoints.
986-1015: LGTM! Wrapper delegates to base with Agent API specifics.The
streaming_query_endpoint_handlerfunction is a thin wrapper that supplies Agent API-specific implementations (retrieve_responseandcreate_agent_response_generator) to the base handler, maintaining backward compatibility for the v1 endpoint.src/app/endpoints/streaming_query_v2.py (4)
1-56: LGTM! Module setup follows coding guidelines.The module has a descriptive docstring, proper imports, and appropriate configuration. The router is correctly tagged with "streaming_query_v2" and response schemas are well-defined.
104-350: LGTM! Response generator correctly handles Responses API events.The
create_responses_response_generatorfactory properly processes all Responses API event types:
response.created→ start event with conversation IDresponse.output_text.delta→ token eventsresponse.function_call_arguments.delta→ tool_call eventsresponse.completed→ turn_complete event- Post-streaming: token usage extraction, transcript storage, topic summary, caching, and persistence
The event handling logic is comprehensive and correctly accumulates state across chunks.
353-382: LGTM! Endpoint wrapper follows established pattern.The
streaming_query_endpoint_handler_v2function correctly delegates to the base handler with v2-specific implementations (retrieve_responseandcreate_responses_response_generator), maintaining consistency with the v1 endpoint structure.
385-457: LGTM! Retrieve function properly prepares Responses API call.The
retrieve_responsefunction correctly:
- Validates attachments
- Prepares tools via
prepare_tools_for_responses_api(with mcp_headers)- Concatenates attachment content into input text
- Conditionally builds create parameters to avoid type errors
- Initiates streaming response with
stream=TrueThe conversation ID is returned as empty string since it arrives in the first chunk, which is handled correctly by the generator.
dc74af7 to
a1b6f9c
Compare
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems legit, but would be nice to have @eranco74 ack too. TYVM
| emitted_turn_complete = False | ||
|
|
||
| # Handle conversation id and start event in-band on response.created | ||
| conv_id = context.conversation_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always an empty string, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. but I prefer to leave it there as it depends on the retrieve_response implementation (in case it changes in the future). What do you think?
| with get_session() as session: | ||
| existing_conversation = ( | ||
| session.query(UserConversation).filter_by(id=conv_id).first() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like synchronous code in an async function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same in streaming_query.py
| ) | ||
|
|
||
| yield stream_end_event(context.metadata_map, summary, token_usage, media_type) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can/should split this logic into two parts the streaming part and the rest (store_transctipt, summary, cache update)
All the database and cache work can happen after the stream is finished.
Generator function: Its only job is to yield SSE events and return the final summary and conv_id.
Cleanup function: A new async function that takes the summary and context and does all the database/cache work.
Also seems that both streaming_query and streaming_query_v2 can use the same Cleanup function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I half addressed that. Made the cleanup function to avoid code duplication for streaming_query and streaming_query_v2, but did not split it out of the generator as that seems a bigger change and was already there before this PR in streaming_query
a1b6f9c to
9cc8028
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/app/endpoints/streaming_query.py(6 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/constants.py(1 hunks)src/models/context.py(1 hunks)src/utils/endpoints.py(2 hunks)tests/unit/app/endpoints/test_query_v2.py(12 hunks)tests/unit/app/endpoints/test_streaming_query.py(4 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/models/context.py
- tests/unit/app/endpoints/test_streaming_query_v2.py
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/app/endpoints/test_query_v2.pysrc/utils/endpoints.pytests/unit/app/endpoints/test_streaming_query.pysrc/app/endpoints/streaming_query.pysrc/constants.pysrc/app/endpoints/streaming_query_v2.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/utils/endpoints.pysrc/app/endpoints/streaming_query.pysrc/constants.pysrc/app/endpoints/streaming_query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.py
src/constants.py
📄 CodeRabbit inference engine (CLAUDE.md)
Keep shared constants in a central src/constants.py with descriptive comments
Files:
src/constants.py
🧠 Learnings (4)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/**/*.py : Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Applied to files:
tests/unit/app/endpoints/test_query_v2.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Handle Llama Stack APIConnectionError when interacting with the Llama client
Applied to files:
tests/unit/app/endpoints/test_query_v2.pysrc/utils/endpoints.pysrc/constants.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/utils/endpoints.pysrc/constants.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/app/endpoints/**/*.py : In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Applied to files:
src/utils/endpoints.py
🧬 Code graph analysis (5)
tests/unit/app/endpoints/test_query_v2.py (3)
src/models/config.py (1)
ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (1)
get_mcp_tools(575-614)tests/unit/app/endpoints/test_streaming_query_v2.py (2)
dummy_request(23-28)_raise(207-208)
src/utils/endpoints.py (6)
src/models/requests.py (1)
QueryRequest(73-225)src/utils/types.py (1)
TurnSummary(89-163)src/configuration.py (1)
AppConfig(39-181)src/app/database.py (1)
get_session(34-40)src/models/database/conversations.py (1)
UserConversation(11-38)src/models/cache_entry.py (1)
CacheEntry(7-24)
tests/unit/app/endpoints/test_streaming_query.py (2)
src/utils/types.py (1)
TurnSummary(89-163)tests/unit/app/endpoints/test_query.py (1)
mock_metrics(71-80)
src/app/endpoints/streaming_query.py (4)
src/models/context.py (1)
ResponseGeneratorContext(12-48)src/utils/endpoints.py (2)
cleanup_after_streaming(598-711)get_system_prompt(127-191)src/utils/types.py (2)
TurnSummary(89-163)append_tool_calls_from_llama(96-117)src/utils/token_counter.py (2)
extract_token_usage_from_turn(44-94)TokenCounter(18-41)
src/app/endpoints/streaming_query_v2.py (12)
src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(451-541)get_topic_summary(235-274)prepare_tools_for_responses_api(617-668)retrieve_response(304-427)src/app/endpoints/streaming_query.py (6)
format_stream_data(123-134)stream_end_event(161-217)stream_start_event(137-158)streaming_query_endpoint_handler_base(801-933)response_generator(713-796)retrieve_response(968-1089)src/authentication/__init__.py (1)
get_auth_dependency(14-52)src/authorization/middleware.py (1)
authorize(111-122)src/models/config.py (2)
config(140-146)Action(329-375)src/models/context.py (1)
ResponseGeneratorContext(12-48)src/models/requests.py (1)
QueryRequest(73-225)src/utils/endpoints.py (2)
cleanup_after_streaming(598-711)get_system_prompt(127-191)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
- GitHub Check: build-pr
🔇 Additional comments (1)
src/constants.py (1)
5-5: Verify 0.3.0 breaking changes are addressed in the codebase.The web search reveals Llama Stack 0.3.0 contains breaking API changes: Tools API now uses
input_schema/output_schemainstead ofparameters, Chat/agent API shapes were cleaned up, and routes were renamed. These changes require codebase verification beyond the version constant update.Additionally, hardcoded version "0.2.22" appears in two locations—
tests/integration/endpoints/test_info_integration.py(lines 30, 69) andsrc/models/responses.py(lines 321, 338)—and should be reviewed for consistency with the new maximum version. Confirm the v2 streaming feature integrates properly with 0.3.0's API changes and that test coverage validates compatibility.
| create_params: dict[str, Any] = { | ||
| "input": input_text, | ||
| "model": model_id, | ||
| "instructions": system_prompt, | ||
| "stream": True, | ||
| "store": True, | ||
| "tools": toolgroups, | ||
| } | ||
| if query_request.conversation_id: | ||
| create_params["previous_response_id"] = query_request.conversation_id | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only send tools when we actually have any
We always add tools=None to the payload. The Responses API expects the field to be omitted unless we supply a real list; sending null can trigger validation errors. Gate the assignment so we only include the key when toolgroups is truthy.
- create_params: dict[str, Any] = {
+ create_params: dict[str, Any] = {
"input": input_text,
"model": model_id,
"instructions": system_prompt,
"stream": True,
"store": True,
- "tools": toolgroups,
}
if query_request.conversation_id:
create_params["previous_response_id"] = query_request.conversation_id
+ if toolgroups:
+ create_params["tools"] = toolgroups🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query_v2.py around lines 391 to 401, the payload
always includes "tools": toolgroups (which becomes null when empty) causing
validation errors; change the construction so the "tools" key is only added to
create_params when toolgroups is truthy (e.g., if toolgroups:
create_params["tools"] = toolgroups) so the field is omitted when there are no
tools.
9cc8028 to
09bec86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/utils/endpoints.py (2)
400-416: Fix AnyUrl validation to avoid runtime errors (pydantic v2).Directly calling AnyUrl(...) is not a supported validation path in pydantic v2 and can raise TypeError; this code path is now exercised by cleanup_after_streaming. Use a TypeAdapter to validate, and broaden exception handling.
Apply:
@@ -from pydantic import AnyUrl, ValidationError +from pydantic import AnyUrl, ValidationError, TypeAdapter @@ logger = get_logger(__name__) + +# Adapter for validating URLs in pydantic v2 +_ANYURL = TypeAdapter(AnyUrl) @@ - validated_url = AnyUrl(src) - except ValidationError: + validated_url = _ANYURL.validate_python(src) + except (ValidationError, ValueError, TypeError): logger.warning("Invalid URL in chunk source: %s", src) validated_url = None @@ - validated_doc_url = None - if doc_url and doc_url.startswith("http"): - validated_doc_url = AnyUrl(doc_url) - except ValidationError: + validated_doc_url = None + if doc_url and doc_url.startswith("http"): + validated_doc_url = _ANYURL.validate_python(doc_url) + except (ValidationError, ValueError, TypeError): logger.warning("Invalid URL in metadata: %s", doc_url) validated_doc_url = None @@ - validated_url = None - if doc_url.startswith("http"): - validated_url = AnyUrl(doc_url) - except ValidationError: + validated_url = None + if doc_url.startswith("http"): + validated_url = _ANYURL.validate_python(doc_url) + except (ValidationError, ValueError, TypeError): logger.warning("Invalid URL in metadata_map: %s", doc_url) validated_url = NoneAlso applies to: 443-451, 472-480
356-367: Docstring return description mismatches actual return.Function returns (agent, session_id, conversation_id). Clarify to prevent misuse.
- Returns: - tuple[AsyncAgent, str]: A tuple containing the agent and session_id. + Returns: + tuple[AsyncAgent, str, str]: (agent, session_id, conversation_id).Also applies to: 378-381
♻️ Duplicate comments (1)
src/app/endpoints/streaming_query_v2.py (1)
391-401: Only include “tools” when present to satisfy Responses API type contract.Passing tools=None can be invalid; gate the key. (Previously flagged.)
- create_params: dict[str, Any] = { + create_params: dict[str, Any] = { "input": input_text, "model": model_id, "instructions": system_prompt, "stream": True, "store": True, - "tools": toolgroups, } if query_request.conversation_id: create_params["previous_response_id"] = query_request.conversation_id + if toolgroups: + create_params["tools"] = toolgroups
🧹 Nitpick comments (6)
src/utils/endpoints.py (2)
598-616: Add precise Callable type hints for injected helpers in cleanup_after_streaming.Improves readability and static checking; avoids Any.
@@ -from typing import Any +from typing import Any, Callable, Awaitable @@ async def cleanup_after_streaming( @@ - get_topic_summary_func: Any, - is_transcripts_enabled_func: Any, - store_transcript_func: Any, - persist_user_conversation_details_func: Any, + get_topic_summary_func: Callable[[str, AsyncLlamaStackClient, str], Awaitable[str]], + is_transcripts_enabled_func: Callable[[], bool], + store_transcript_func: Callable[..., None], + persist_user_conversation_details_func: Callable[ + [str, str, str, str, str | None], None + ],Also applies to: 5-5
7-7: Use public client import to avoid private API coupling.Prefer from llama_stack_client import AsyncLlamaStackClient for stability. Based on learnings.
-from llama_stack_client._client import AsyncLlamaStackClient +from llama_stack_client import AsyncLlamaStackClienttests/unit/app/endpoints/test_query_v2.py (1)
21-26: Optional: Pre-set authorized_actions in the dummy_request fixture.Avoids per-test mutation, reduces boilerplate.
@pytest.fixture def dummy_request() -> Request: """Create a dummy FastAPI Request object for testing.""" req = Request(scope={"type": "http"}) - return req + req.state.authorized_actions = set() # or set(Action) if permissive is desired + return reqAlso applies to: 391-396
src/app/endpoints/streaming_query_v2.py (2)
211-224: Optional: Drop “role” from tool_call events for parity with Agent API JSON.OLS JSON tests for v1 omit “role”; consider aligning v2 for consistency.
- yield format_stream_data( + yield format_stream_data( { "event": "tool_call", "data": { "id": chunk_id, - "role": "tool_execution", "token": delta, }, } )
342-364: Doc tweak: this is the Responses API path, not agents.Very minor clarity improvement.
- Retrieve response from LLMs and agents. + Retrieve a streaming response using the Responses API. @@ - ID from the Llama Stack agent for a given user query. + ID from the Responses API for a given user query.src/app/endpoints/streaming_query.py (1)
727-735: Optional: Initialize summary with empty string for consistency.Match the v2 fallback behavior and avoid non-empty default.
- summary = TurnSummary(llm_response="No response from the model", tool_calls=[]) + summary = TurnSummary(llm_response="", tool_calls=[])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/app/endpoints/streaming_query.py(6 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/models/context.py(1 hunks)src/utils/endpoints.py(2 hunks)tests/unit/app/endpoints/test_query_v2.py(12 hunks)tests/unit/app/endpoints/test_streaming_query.py(4 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/unit/app/endpoints/test_streaming_query_v2.py
- src/models/context.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/utils/endpoints.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/utils/endpoints.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/app/endpoints/test_query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_streaming_query.pytests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_streaming_query.pytests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (4)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/utils/endpoints.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/app/endpoints/**/*.py : In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Applied to files:
src/utils/endpoints.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Handle Llama Stack APIConnectionError when interacting with the Llama client
Applied to files:
src/utils/endpoints.pytests/unit/app/endpoints/test_query_v2.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/**/*.py : Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Applied to files:
tests/unit/app/endpoints/test_query_v2.py
🧬 Code graph analysis (5)
src/utils/endpoints.py (6)
src/models/requests.py (1)
QueryRequest(73-225)src/utils/types.py (1)
TurnSummary(89-163)src/configuration.py (1)
AppConfig(39-181)src/app/database.py (1)
get_session(34-40)src/models/database/conversations.py (1)
UserConversation(11-38)src/models/cache_entry.py (1)
CacheEntry(7-24)
src/app/endpoints/streaming_query.py (3)
src/models/context.py (1)
ResponseGeneratorContext(12-48)src/utils/endpoints.py (1)
cleanup_after_streaming(598-711)src/app/endpoints/streaming_query_v2.py (1)
response_generator(114-298)
src/app/endpoints/streaming_query_v2.py (8)
src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(451-541)get_topic_summary(235-274)prepare_tools_for_responses_api(617-668)retrieve_response(304-427)src/app/endpoints/streaming_query.py (6)
format_stream_data(123-134)stream_end_event(161-217)stream_start_event(137-158)streaming_query_endpoint_handler_base(801-933)response_generator(713-796)retrieve_response(968-1089)src/utils/endpoints.py (2)
cleanup_after_streaming(598-711)get_system_prompt(127-191)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
tests/unit/app/endpoints/test_streaming_query.py (2)
src/utils/types.py (1)
TurnSummary(89-163)tests/unit/app/endpoints/test_query.py (1)
mock_metrics(71-80)
tests/unit/app/endpoints/test_query_v2.py (3)
src/models/config.py (1)
ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (1)
get_mcp_tools(575-614)tests/unit/app/endpoints/test_streaming_query_v2.py (3)
test_retrieve_response_builds_rag_and_mcp_tools(32-60)dummy_request(23-28)_raise(207-208)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: build-pr
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (3)
tests/unit/app/endpoints/test_query_v2.py (1)
28-59: Good coverage for v2 tools, attachments, usage, and error paths.Tests clearly assert tool merging (token + per-server), no-tools bypass, attachment inclusion, both usage shapes, and API connection errors. LGTM.
Also applies to: 61-97, 100-133, 135-178, 181-233, 236-274, 277-311, 314-348, 351-388, 391-441, 444-470
tests/unit/app/endpoints/test_streaming_query.py (1)
58-59: Tests updated to the centralized cleanup and generator flow look solid.
- Patching cleanup_after_streaming avoids duplicating persistence logic.
- Core streaming assertions and OLS-compat tests remain intact.
Also applies to: 76-80, 215-359, 362-368, 391-399, 601-629
src/app/endpoints/streaming_query.py (1)
801-907: Base handler refactor with context + factories is clean and extensible.The abstraction reduces duplication between v1/v2 and keeps endpoint logic lean. LGTM.
Also applies to: 936-966
09bec86 to
f3c324e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/app/endpoints/streaming_query_v2.py (1)
395-404: Do not sendtools=Noneto the Responses API.Line 400 always injects
"tools": toolgroups, so when no tools are available we emittools: null. The Responses API expects the field to be omitted unless we have an actual list; sendingnulltrips validation and breaks streaming (this was already called out earlier). Please add the key only whentoolgroupsis truthy.- create_params: dict[str, Any] = { - "input": input_text, - "model": model_id, - "instructions": system_prompt, - "stream": True, - "store": True, - "tools": toolgroups, - } + create_params: dict[str, Any] = { + "input": input_text, + "model": model_id, + "instructions": system_prompt, + "stream": True, + "store": True, + } if query_request.conversation_id: create_params["previous_response_id"] = query_request.conversation_id + if toolgroups: + create_params["tools"] = toolgroups
🧹 Nitpick comments (1)
src/app/endpoints/streaming_query.py (1)
801-934: Consider keyword-only parameters to reduce pylint suppressions.The base handler properly implements dependency injection for Agent/Responses API flexibility. However, with 6 positional parameters, you're triggering pylint R0913. Consider making some parameters keyword-only by adding
*beforeretrieve_response_func:async def streaming_query_endpoint_handler_base( request: Request, query_request: QueryRequest, auth: AuthTuple, mcp_headers: dict[str, dict[str, str]], + *, retrieve_response_func: Callable[..., Any], create_response_generator_func: Callable[..., Any], ) -> StreamingResponse:This would eliminate the need for the
too-many-positional-argumentsdisable while maintaining the same functionality.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/app/endpoints/streaming_query.py(6 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/models/context.py(1 hunks)src/utils/endpoints.py(2 hunks)tests/unit/app/endpoints/test_query_v2.py(12 hunks)tests/unit/app/endpoints/test_streaming_query.py(4 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/models/context.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/utils/endpoints.pysrc/app/endpoints/streaming_query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pysrc/utils/endpoints.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_streaming_query.pysrc/app/endpoints/streaming_query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_streaming_query.py
🧠 Learnings (4)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/utils/endpoints.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/app/endpoints/**/*.py : In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Applied to files:
src/utils/endpoints.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Handle Llama Stack APIConnectionError when interacting with the Llama client
Applied to files:
src/utils/endpoints.pytests/unit/app/endpoints/test_query_v2.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/**/*.py : Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Applied to files:
tests/unit/app/endpoints/test_query_v2.py
🔇 Additional comments (3)
src/app/endpoints/streaming_query.py (3)
8-8: LGTM: Import additions support the refactoring.The new imports (
Callable,ResponseGeneratorContext,cleanup_after_streaming) are properly utilized in the factory-based streaming pattern and successfully address the previous pylint R0917 issue.Also applies to: 48-48, 54-54
697-799: LGTM: Factory function successfully addresses previous R0917 issue.The refactoring bundles parameters into
ResponseGeneratorContextand properly implements the streaming response generation. The inner generator correctly handles the full lifecycle: start event, chunk processing, token usage extraction, and cleanup operations.
936-966: LGTM: Clean delegation to base handler.The endpoint handler properly wraps the base handler with Agent API-specific factories (
retrieve_response,create_agent_response_generator), maintaining backward compatibility while enabling the v1/v2 pattern.
| kwargs = mock_client.responses.create.call_args.kwargs | ||
| assert kwargs["tools"] is None | ||
| assert kwargs["stream"] is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust assertion: tools key should be absent when no tools.
Line 83 currently enforces kwargs["tools"] is None, so the test locks in passing tools=null to the Responses API. Once we drop that key (required to avoid API validation failures), this assertion should instead verify that the parameter is omitted entirely.
- assert kwargs["tools"] is None
+ assert "tools" not in kwargs📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| kwargs = mock_client.responses.create.call_args.kwargs | |
| assert kwargs["tools"] is None | |
| assert kwargs["stream"] is True | |
| kwargs = mock_client.responses.create.call_args.kwargs | |
| assert "tools" not in kwargs | |
| assert kwargs["stream"] is True |
🤖 Prompt for AI Agents
In tests/unit/app/endpoints/test_streaming_query_v2.py around lines 82 to 84,
the test currently asserts kwargs["tools"] is None which assumes the Responses
API receives a tools key; update the assertion to verify the key is absent by
asserting "tools" not in kwargs (and keep the existing assert kwargs["stream"]
is True), so the test expects the parameter to be omitted rather than present
with a null value.
Description
Add v2 endpoint to interact with LlamaStack using Responses API (streaming) instead of Agent API
It is adding the support in a v2 endpoint to ensure backward compatibility
This is a follow up of the work done in #753
What is missing and to be done in follow up PRs:
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Refactor
Tests