Base implementation of streaming Responses API #754

luis5tb · 2025-11-04T07:13:16Z

Description

Add v2 endpoint to interact with LlamaStack using Responses API (streaming) instead of Agent API
It is adding the support in a v2 endpoint to ensure backward compatibility

This is a follow up of the work done in #753

What is missing and to be done in follow up PRs:

Add support for shields once llamastack supports it for responses API
Add support for parsing and returning referenced document on the response

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Added v2 streaming query endpoint with Responses API streaming support and SSE event flow.
- Centralized tool preparation and merged header handling for external tool integrations.
- New streaming lifecycle cleanup that persists transcripts, summaries, and conversation cache.
- Introduced a context object to simplify streaming generator orchestration.
Refactor
- Consolidated streaming orchestration into a reusable base flow for v1/v2 handlers and generator factories.
- Router updated to expose v2 streaming endpoints.
Tests
- Added and expanded tests for v2 streaming, tool assembly, error handling, and typing improvements.

coderabbitai · 2025-11-04T07:13:26Z

Walkthrough

Refactors streaming query endpoints to support Agent API (v1) and Responses API (v2) via a shared base handler, factory-based generators, centralized tool preparation, a ResponseGeneratorContext dataclass, post-streaming cleanup utility, and registers a v2 streaming router under the /v2 prefix.

Changes

Cohort / File(s)	Summary
Tool preparation & MCP header merging `src/app/endpoints/query_v2.py`	Adds `prepare_tools_for_responses_api()` to assemble RAG and MCP toolgroups; updates `get_mcp_tools()` to merge token-based Authorization with per-server `mcp_headers`, allowing per-server header overrides; `prepare_tools_for_responses_api` exported.
Streaming base + Agent factory `src/app/endpoints/streaming_query.py`	Introduces `streaming_query_endpoint_handler_base()` (injectable retrieval/generator), adds `create_agent_response_generator()` factory, adapts `retrieve_response` usage to factory pattern, and wires lifecycle/cleanup through a `ResponseGeneratorContext`.
Responses API (v2) streaming `src/app/endpoints/streaming_query_v2.py`	New module implementing v2 streaming: `streaming_query_endpoint_handler_v2()`, `create_responses_response_generator()`, and a v2 `retrieve_response()` that prepares tools, validates attachments, initiates Responses API streaming, and yields SSE events (start, token, tool_call, turn_complete, end).
Router registration `src/app/routers.py`	Registers new `streaming_query_v2` router under the `/v2` prefix alongside `query_v2`.
Shared streaming context `src/models/context.py`	Adds public dataclass `ResponseGeneratorContext` aggregating conversation_id, user_id, model/provider ids, client, query_request, started_at, metadata_map, and flags to simplify generator APIs.
Post-streaming cleanup utility `src/utils/endpoints.py`	Adds `cleanup_after_streaming()` to handle transcript storage, topic summary retrieval, referenced document creation, CacheEntry construction, and persistence; uses injected helpers for topic/trancript/persistence.
Query v2 tests & typing `tests/unit/app/endpoints/test_query_v2.py`	Adds fixture typing and explicit return annotations to tests; new `test_get_mcp_tools_with_mcp_headers()` to validate header merging behavior.
V2 streaming tests `tests/unit/app/endpoints/test_streaming_query_v2.py`	New test module covering v2 retrieve_response tool building, no-tools behavior, successful SSE event emission, and API connection error handling.
Streaming tests refactor `tests/unit/app/endpoints/test_streaming_query.py`	Replaces multiple per-call mocks with a single `cleanup_after_streaming` patch; simplifies helper signatures and removes obsolete imports.
Router tests update `tests/unit/app/test_routers.py`	Updates router count and asserts inclusion of `query_v2` and `streaming_query_v2` routers with `/v2` prefix.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Base as streaming_query_endpoint_handler_base
    participant Retrieval as retrieve_response_func
    participant Generator as Response Generator Factory
    participant StreamAPI as Response Stream (Agent/Responses)
    participant Cleanup as cleanup_after_streaming
    participant Persistence as Cache/DB

    Client->>Base: HTTP request (builds ResponseGeneratorContext)
    Base->>Retrieval: call retrieve_response_func(context)
    Retrieval->>StreamAPI: start streaming (async iterator)
    StreamAPI-->>Generator: stream chunks (init / tokens / tools / complete)
    Generator->>Base: yield SSE events (start, token, tool_call, turn_complete, end)
    Base->>Client: stream SSE events
    Base->>Cleanup: invoke cleanup_after_streaming(context, summary, rag_chunks)
    Cleanup->>Persistence: persist cache, transcripts, topic summary
    Persistence-->>Cleanup: ack

sequenceDiagram
    participant V1 as streaming_query_endpoint_handler (Agent API)
    participant V2 as streaming_query_endpoint_handler_v2 (Responses API)
    participant Base as streaming_query_endpoint_handler_base
    participant Context as ResponseGeneratorContext

    Note over V1,V2: V1 and V2 construct Context and delegate to shared Base

    V1->>Context: populate Agent-specific fields
    V1->>Base: provide Agent retrieve/generator funcs

    V2->>Context: populate Responses-specific fields
    V2->>Base: provide Responses retrieve/generator funcs

    Base-->>V1: StreamingResponse
    Base-->>V2: StreamingResponse

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:
- Parameter mapping into ResponseGeneratorContext across v1/v2 call sites.
- Correct merging/precedence of Authorization and per-server mcp_headers.
- cleanup_after_streaming branching (transcript/topic summary/persist flows).
- SSE event mapping in both generator implementations and token/tool boundary handling.
- Tests: ensure refactored tests correctly stub new shared cleanup and streaming factories.

Possibly related PRs

Base implementation of non-streaming Responses API #753 — Overlaps on query_v2.py tool preparation and MCP header handling.
[RHDHPAI-1143] Implement referenced_documents caching #643 — Related to CacheEntry/referenced documents and endpoint caching/persistence changes.
Streaming improvements #254 — Related refactor of streaming event generation and SSE handling.

Suggested reviewers

tisnik
manstis

Poem

🐇 I hopped through streams both old and new,
Factories stitched v1 and v2,
Context bundled, tools aligned,
Cleanup saved what we defined,
SSEs danced — a rabbit's queue! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Base implementation of streaming Responses API' clearly and concisely summarizes the main objective of the PR: adding a v2 streaming endpoint for the Responses API. It directly reflects the primary change across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 92.31% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b52ea28 and e95bc2b.

📒 Files selected for processing (8)

src/app/endpoints/query.py (4 hunks)
src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query.py (5 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (3 hunks)
tests/unit/app/endpoints/test_query_v2.py (1 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
tests/unit/app/test_routers.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (7)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

tests/unit/app/test_routers.py
src/app/routers.py
tests/unit/app/endpoints/test_streaming_query_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
tests/unit/app/endpoints/test_query_v2.py
src/app/endpoints/query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/test_routers.py
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/unit/app/endpoints/test_query_v2.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/test_routers.py
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/unit/app/endpoints/test_query_v2.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/routers.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/routers.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/routers.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py

🧠 Learnings (1)

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

src/app/endpoints/query.py

🧬 Code graph analysis (7)

src/app/routers.py (1)

tests/unit/app/test_routers.py (1)

include_router (37-52)

tests/unit/app/endpoints/test_streaming_query_v2.py (5)

src/models/requests.py (1)

QueryRequest (73-225)

src/models/config.py (3)

config (140-146)

Action (329-375)

ModelContextProtocolServer (169-174)

src/app/endpoints/streaming_query.py (1)

retrieve_response (1024-1145)

src/app/endpoints/streaming_query_v2.py (2)

retrieve_response (409-491)

streaming_query_endpoint_handler_v2 (379-406)

src/configuration.py (1)

mcp_servers (101-105)

src/app/endpoints/streaming_query.py (7)

src/models/requests.py (1)

QueryRequest (73-225)

src/app/endpoints/streaming_query_v2.py (2)

response_generator (138-372)

retrieve_response (409-491)

src/utils/types.py (2)

TurnSummary (89-163)

append_tool_calls_from_llama (96-117)

src/utils/endpoints.py (4)

get_system_prompt (126-190)

create_rag_chunks_dict (383-396)

create_referenced_documents_with_metadata (563-577)

store_conversation_into_cache (231-251)

src/app/endpoints/query.py (4)

is_transcripts_enabled (98-104)

get_topic_summary (184-214)

persist_user_conversation_details (107-139)

retrieve_response (639-798)

src/utils/transcripts.py (1)

store_transcript (40-99)

src/app/database.py (1)

get_session (34-40)

src/app/endpoints/streaming_query_v2.py (9)

src/app/database.py (1)

get_session (34-40)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

persist_user_conversation_details (107-139)

validate_attachments_metadata (801-830)

src/app/endpoints/query_v2.py (3)

extract_token_usage_from_responses_api (339-430)

get_mcp_tools (456-472)

get_rag_tools (433-453)

src/app/endpoints/streaming_query.py (6)

format_stream_data (125-136)

stream_end_event (163-219)

stream_start_event (139-160)

streaming_query_endpoint_handler_base (860-989)

response_generator (733-855)

retrieve_response (1024-1145)

src/models/cache_entry.py (1)

CacheEntry (7-24)

src/models/database/conversations.py (1)

UserConversation (11-38)

src/utils/endpoints.py (3)

create_referenced_documents_with_metadata (563-577)

get_system_prompt (126-190)

store_conversation_into_cache (231-251)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/app/endpoints/query_v2.py (10)

src/app/endpoints/query.py (4)

query_endpoint_handler_base (217-423)

validate_attachments_metadata (801-830)

get_topic_summary (184-214)

retrieve_response (639-798)

src/authentication/__init__.py (1)

get_auth_dependency (14-52)

src/authorization/middleware.py (1)

authorize (111-122)

src/configuration.py (2)

configuration (73-77)

mcp_servers (101-105)

src/models/requests.py (1)

QueryRequest (73-225)

src/models/responses.py (4)

ForbiddenResponse (1120-1142)

QueryResponse (194-305)

ReferencedDocument (179-191)

UnauthorizedResponse (1094-1117)

src/utils/endpoints.py (2)

get_system_prompt (126-190)

get_topic_summary_system_prompt (193-204)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

tests/unit/app/endpoints/test_query_v2.py (4)

src/models/requests.py (2)

QueryRequest (73-225)

Attachment (16-70)

src/models/config.py (2)

config (140-146)

ModelContextProtocolServer (169-174)

src/app/endpoints/query_v2.py (4)

get_rag_tools (433-453)

get_mcp_tools (456-472)

retrieve_response (144-315)

query_endpoint_handler_v2 (119-141)

src/configuration.py (2)

mcp_servers (101-105)

llama_stack_configuration (87-91)

src/app/endpoints/query.py (2)

src/app/endpoints/streaming_query.py (1)

retrieve_response (1024-1145)

src/app/endpoints/query_v2.py (2)

retrieve_response (144-315)

get_topic_summary (70-114)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: e2e_tests (azure)
GitHub Check: e2e_tests (ci)

🔇 Additional comments (1)

src/app/endpoints/streaming_query.py (1)
699-857: Specify the concrete return type instead of Any.

Per our Python guidelines we need full type annotations, but create_agent_response_generator(...) -> Any hides the callable shape and weakens static checks. Please annotate it precisely (e.g. Callable[[AsyncIterator[AgentTurnResponseStreamChunk]], AsyncIterator[str]]) so downstream call sites stay type-safe.
-def create_agent_response_generator(  # pylint: disable=too-many-arguments,too-many-locals
+def create_agent_response_generator(  # pylint: disable=too-many-arguments,too-many-locals
     conversation_id: str,
     user_id: str,
     model_id: str,
     provider_id: str,
     query_request: QueryRequest,
     metadata_map: dict[str, dict[str, Any]],
     client: AsyncLlamaStackClient,
     llama_stack_model_id: str,
     started_at: str,
     _skip_userid_check: bool,
-) -> Any:
+) -> Callable[
+    [AsyncIterator[AgentTurnResponseStreamChunk]],
+    AsyncIterator[str],
+]:

src/app/endpoints/query_v2.py

src/app/endpoints/streaming_query_v2.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

src/app/endpoints/streaming_query_v2.py (2)

175-189: Stop persisting the conversation inside the "response.created" handler.

Persisting here means the later call to persist_user_conversation_details() at lines 365-371 increments message_count a second time, and it also makes existing_conversation non-null in the topic-summary block (lines 329-336), so get_topic_summary() never runs for brand-new conversations.

Apply this diff to drop the premature persist:

             if event_type == "response.created":
                 try:
                     conv_id = getattr(chunk, "response").id
                 except Exception:  # pylint: disable=broad-except
                     conv_id = ""
                 yield stream_start_event(conv_id)
-                if conv_id:
-                    persist_user_conversation_details(
-                        user_id=user_id,
-                        conversation_id=conv_id,
-                        model=model_id,
-                        provider_id=provider_id,
-                        topic_summary=None,
-                    )
                 continue

454-462: Streamed requests must include attachment content.

Unlike the non-streaming path in query_v2.py (lines 197-206), this streaming retrieve skips the attachment text entirely, so any uploaded logs/configs never reach the LLM. That's a functional regression for users relying on attachments.

Apply this diff to mirror the non-streaming input assembly:

+    # Prepare input for Responses API
+    # Convert attachments to text and concatenate with query
+    input_text = query_request.query
+    if query_request.attachments:
+        for attachment in query_request.attachments:
+            input_text += (
+                f"\n\n[Attachment: {attachment.attachment_type}]\n"
+                f"{attachment.content}"
+            )
+
     response = await client.responses.create(
-        input=query_request.query,
+        input=input_text,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
         tools=(cast(Any, toolgroups)),
         stream=True,
         store=True,
     )

🧹 Nitpick comments (4)

src/app/endpoints/streaming_query_v2.py (2)
33-33: Use absolute import path for internal authentication module.

The coding guidelines specify that internal modules in src/**/*.py should use absolute imports (e.g., from auth import get_auth_dependency), but this uses a relative-style import.

As per coding guidelines

Apply this diff to align with the import convention:
-from authentication import get_auth_dependency
+from auth import get_auth_dependency
53-53: Use __name__ for the logger instead of a hardcoded string.

Coding guidelines require logger = logging.getLogger(__name__) for module logging, which provides better traceability.

As per coding guidelines

Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/app/endpoints/query_v2.py (2)
17-17: Use absolute import path for internal authentication module.

The coding guidelines specify that internal modules in src/**/*.py should use absolute imports (e.g., from auth import get_auth_dependency).

As per coding guidelines

Apply this diff:
-from authentication import get_auth_dependency
+from auth import get_auth_dependency
38-38: Use __name__ for the logger instead of a hardcoded string.

Coding guidelines require logger = logging.getLogger(__name__) for better traceability.

As per coding guidelines

Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e95bc2b and 9e02999.

📒 Files selected for processing (2)

src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

🧬 Code graph analysis (2)

src/app/endpoints/streaming_query_v2.py (9)

src/app/database.py (1)

get_session (34-40)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

persist_user_conversation_details (107-139)

validate_attachments_metadata (801-830)

src/app/endpoints/query_v2.py (4)

extract_token_usage_from_responses_api (317-408)

get_topic_summary (70-114)

prepare_tools_for_responses_api (453-502)

retrieve_response (144-293)

src/app/endpoints/streaming_query.py (6)

format_stream_data (125-136)

stream_end_event (163-219)

stream_start_event (139-160)

streaming_query_endpoint_handler_base (860-989)

response_generator (733-855)

retrieve_response (1024-1145)

src/models/requests.py (1)

QueryRequest (73-225)

src/utils/endpoints.py (3)

create_referenced_documents_with_metadata (563-577)

get_system_prompt (126-190)

store_conversation_into_cache (231-251)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/transcripts.py (1)

store_transcript (40-99)

src/app/endpoints/query_v2.py (7)

src/app/endpoints/query.py (4)

query_endpoint_handler_base (217-423)

validate_attachments_metadata (801-830)

get_topic_summary (184-214)

retrieve_response (639-798)

src/models/responses.py (2)

QueryResponse (194-305)

ReferencedDocument (179-191)

src/utils/endpoints.py (2)

get_system_prompt (126-190)

get_topic_summary_system_prompt (193-204)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

src/app/endpoints/streaming_query_v2.py (1)

retrieve_response (408-468)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: e2e_tests (ci)
GitHub Check: e2e_tests (azure)

🔇 Additional comments (6)

src/app/endpoints/streaming_query_v2.py (1)

376-405: LGTM!

The endpoint handler correctly delegates to the base handler with the v2-specific retrieve and generator functions. The structure and type annotations are appropriate.

src/app/endpoints/query_v2.py (5)

70-114: LGTM!

The topic summary generation correctly uses the Responses API with proper error handling and text extraction logic.

117-141: LGTM!

The endpoint handler correctly delegates to the base handler with appropriate v2-specific functions.

144-293: LGTM!

The retrieve_response function correctly assembles input with attachments, processes the response output, and extracts tool calls. The implementation is solid.

296-314: Placeholder is intentional per PR objectives.

This function currently returns an empty list with a TODO, which aligns with the PR description stating that "support for parsing and returning referenced documents in responses" will be addressed in follow-up PRs.

317-408: LGTM!

The token usage extraction is comprehensive with proper error handling for various response formats and graceful degradation when usage data is unavailable.

src/app/endpoints/query_v2.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/unit/app/endpoints/test_streaming_query_v2.py (1)
189-189: Use the shared MOCK_AUTH constant for test authentication.

The coding guidelines specify using MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") for auth tuples in tests rather than inline values.

As per coding guidelines.

Apply this pattern:
+# At module level
+MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")

 # In test_streaming_query_endpoint_handler_v2_success_yields_events:
-        auth=("user123", "", True, "token-abc"),  # skip_userid_check=True
+        auth=("user123", "", True, "token-abc"),  # Can keep custom values if testing specific scenarios

 # In test_streaming_query_endpoint_handler_v2_api_connection_error:
-        auth=("user123", "", False, "tok"),
+        auth=MOCK_AUTH,
Also applies to: 244-244

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e02999 and dd1cee0.

📒 Files selected for processing (3)

src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/streaming_query_v2.py
tests/unit/app/endpoints/test_streaming_query_v2.py
src/app/endpoints/query_v2.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_streaming_query_v2.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_streaming_query_v2.py

🧬 Code graph analysis (3)

src/app/endpoints/streaming_query_v2.py (10)

src/app/database.py (1)

get_session (34-40)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

persist_user_conversation_details (107-139)

validate_attachments_metadata (801-830)

src/app/endpoints/query_v2.py (4)

extract_token_usage_from_responses_api (317-408)

get_topic_summary (70-114)

prepare_tools_for_responses_api (453-511)

retrieve_response (144-293)

src/app/endpoints/streaming_query.py (6)

format_stream_data (125-136)

stream_end_event (163-219)

stream_start_event (139-160)

streaming_query_endpoint_handler_base (860-989)

response_generator (733-855)

retrieve_response (1024-1145)

src/models/cache_entry.py (1)

CacheEntry (7-24)

src/models/database/conversations.py (1)

UserConversation (11-38)

src/utils/endpoints.py (3)

create_referenced_documents_with_metadata (563-577)

get_system_prompt (126-190)

store_conversation_into_cache (231-251)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/transcripts.py (1)

store_transcript (40-99)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

tests/unit/app/endpoints/test_streaming_query_v2.py (4)

src/models/requests.py (1)

QueryRequest (73-225)

src/models/config.py (3)

config (140-146)

Action (329-375)

ModelContextProtocolServer (169-174)

src/app/endpoints/streaming_query_v2.py (2)

retrieve_response (400-470)

streaming_query_endpoint_handler_v2 (370-397)

src/configuration.py (1)

mcp_servers (101-105)

src/app/endpoints/query_v2.py (6)

src/app/endpoints/query.py (4)

query_endpoint_handler_base (217-423)

validate_attachments_metadata (801-830)

get_topic_summary (184-214)

retrieve_response (639-798)

src/utils/endpoints.py (2)

get_system_prompt (126-190)

get_topic_summary_system_prompt (193-204)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

src/app/endpoints/streaming_query_v2.py (1)

retrieve_response (400-470)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: e2e_tests (azure)
GitHub Check: e2e_tests (ci)

🔇 Additional comments (10)

src/app/endpoints/streaming_query_v2.py (3)

103-365: Well-structured streaming response generator implementation.

The create_responses_response_generator factory pattern cleanly separates concerns and correctly handles:

In-band conversation ID extraction from response.created events

Token and tool call streaming with proper SSE formatting

Single persistence call after topic summary logic (avoiding double-increment)

Attachment content inclusion in input assembly

Token usage extraction and metrics updates

The previous review concerns about premature persistence and missing attachment content have been properly addressed.

368-397: Clean delegation to base handler.

The wrapper pattern provides good separation between v2-specific Responses API logic and shared streaming infrastructure.

400-470: Proper tool preparation and attachment handling for streaming.

The retrieve_response function correctly:

Validates attachments before processing

Prepares RAG and MCP tools with header merging via prepare_tools_for_responses_api

Concatenates attachment content into the input text

Returns an empty conversation ID for in-band extraction

src/app/endpoints/query_v2.py (7)

70-114: Topic summary generation adapted correctly for Responses API.

The function properly uses client.responses.create() with store=False to avoid polluting conversation history, and robustly extracts text from various content structures.

117-141: Clean delegation to shared base handler.

The v2 query endpoint correctly injects Responses API-specific functions into the base handler pattern.

144-293: Solid non-streaming retrieve implementation.

The function correctly:

Validates attachments upfront

Prepares tools with MCP header merging

Assembles input text with attachment content

Parses response output and tool calls from OpenAI format

Extracts token usage and referenced documents

296-314: Placeholder appropriately documents missing functionality.

The TODO clearly indicates that referenced document parsing from Responses API response structures (file_search results, citations) is deferred to follow-up work, as mentioned in the PR objectives.

317-408: Robust token usage extraction with defensive handling.

The implementation correctly:

Handles both dict and object response.usage formats

Gracefully degrades when usage data is absent (expected when llama stack uses chat_completions internally)

Updates Prometheus metrics with proper error handling

Increments call counter even when token counts are unavailable

411-450: Clean tool definition helpers.

The get_rag_tools and get_mcp_tools functions provide clear, focused transformations from configuration to Responses API tool format.

453-511: Tool preparation correctly merges per-request MCP headers.

The previous review concern about dropping mcp_headers has been properly addressed. Lines 494-500 merge the caller's per-request headers into each MCP tool definition, ensuring short-lived tokens and dynamic auth reach the MCP servers.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/streaming_query.py (1)

860-990: Fix docstring mood and argument count to unblock lint.

The new base handler’s docstring starts with “Base handler…”, violating pydocstyle D401, and the signature has six positional parameters, exceeding pylint’s five-argument limit. Please rephrase the docstring to an imperative sentence (e.g., “Handle streaming query requests…”) and collapse the parameters (e.g., wrap injected callbacks/mcp headers into a struct) so pylint R0917 passes.

🧹 Nitpick comments (3)

tests/unit/app/endpoints/test_query_v2.py (1)

385-434: Use the shared MOCK_AUTH constant.

Per the repo’s test guidelines, test auth tuples should reuse MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") instead of crafting custom values. Please switch the auth= argument to that constant (and update related expectations) so the suite follows the standard.
src/app/endpoints/streaming_query_v2.py (2)
53-53: Logger name should use __name__ for consistency.

The logger is initialized with a hardcoded string "app.endpoints.handlers" instead of __name__. Using __name__ ensures the logger reflects the actual module path and improves maintainability.

Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
103-366: Consider refactoring to reduce complexity.

The function has 10 positional parameters (limit: 5), and the inner response_generator has 17 branches (limit: 12) and 75 statements (limit: 50). While the logic is correct, this complexity makes the code harder to maintain and test.

Consider:

Grouping related parameters into a configuration object or dataclass

Extracting event-handling logic into separate helper functions (e.g., _handle_text_delta, _handle_tool_call, _handle_completion)

Moving post-processing logic (transcript, cache, persistence) into a separate function

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd1cee0 and 68dc335.

📒 Files selected for processing (8)

src/app/endpoints/query.py (4 hunks)
src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query.py (5 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (3 hunks)
tests/unit/app/endpoints/test_query_v2.py (1 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
tests/unit/app/test_routers.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/unit/app/endpoints/test_streaming_query_v2.py
tests/unit/app/test_routers.py

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/query_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/routers.py
src/app/endpoints/query.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/query_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/routers.py
src/app/endpoints/query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/query_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
tests/unit/app/endpoints/test_query_v2.py
src/app/routers.py
src/app/endpoints/query.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/query_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/routers.py
src/app/endpoints/query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/query_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query_v2.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query_v2.py

🧠 Learnings (1)

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

src/app/endpoints/query.py

🧬 Code graph analysis (6)

src/app/endpoints/query_v2.py (6)

src/app/endpoints/query.py (4)

query_endpoint_handler_base (217-423)

validate_attachments_metadata (801-830)

get_topic_summary (184-214)

retrieve_response (639-798)

src/utils/endpoints.py (2)

get_system_prompt (126-190)

get_topic_summary_system_prompt (193-204)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

src/app/endpoints/streaming_query_v2.py (1)

retrieve_response (400-470)

src/app/endpoints/streaming_query.py (8)

src/models/requests.py (1)

QueryRequest (73-225)

src/app/endpoints/streaming_query_v2.py (2)

response_generator (137-363)

retrieve_response (400-470)

src/utils/types.py (2)

TurnSummary (89-163)

append_tool_calls_from_llama (96-117)

src/utils/endpoints.py (4)

get_system_prompt (126-190)

create_rag_chunks_dict (383-396)

create_referenced_documents_with_metadata (563-577)

store_conversation_into_cache (231-251)

src/utils/token_counter.py (2)

extract_token_usage_from_turn (44-94)

TokenCounter (18-41)

src/app/endpoints/query.py (4)

is_transcripts_enabled (98-104)

get_topic_summary (184-214)

persist_user_conversation_details (107-139)

retrieve_response (639-798)

src/utils/transcripts.py (1)

store_transcript (40-99)

src/app/endpoints/query_v2.py (2)

get_topic_summary (235-274)

retrieve_response (304-427)

src/app/endpoints/streaming_query_v2.py (16)

src/app/database.py (1)

get_session (34-40)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

persist_user_conversation_details (107-139)

validate_attachments_metadata (801-830)

src/app/endpoints/query_v2.py (4)

extract_token_usage_from_responses_api (451-541)

get_topic_summary (235-274)

prepare_tools_for_responses_api (617-668)

retrieve_response (304-427)

src/app/endpoints/streaming_query.py (6)

format_stream_data (125-136)

stream_end_event (163-219)

stream_start_event (139-160)

streaming_query_endpoint_handler_base (860-989)

response_generator (733-855)

retrieve_response (1024-1145)

src/authentication/__init__.py (1)

get_auth_dependency (14-52)

src/authorization/middleware.py (1)

authorize (111-122)

src/models/cache_entry.py (1)

CacheEntry (7-24)

src/models/config.py (2)

config (140-146)

Action (329-375)

src/models/database/conversations.py (1)

UserConversation (11-38)

src/models/requests.py (1)

QueryRequest (73-225)

src/models/responses.py (2)

ForbiddenResponse (1120-1142)

UnauthorizedResponse (1094-1117)

src/utils/endpoints.py (3)

create_referenced_documents_with_metadata (563-577)

get_system_prompt (126-190)

store_conversation_into_cache (231-251)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/transcripts.py (1)

store_transcript (40-99)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

tests/unit/app/endpoints/test_query_v2.py (4)

src/models/requests.py (2)

QueryRequest (73-225)

Attachment (16-70)

src/models/config.py (2)

config (140-146)

ModelContextProtocolServer (169-174)

src/app/endpoints/query_v2.py (4)

get_rag_tools (552-572)

get_mcp_tools (575-614)

retrieve_response (304-427)

query_endpoint_handler_v2 (279-301)

src/configuration.py (2)

mcp_servers (101-105)

llama_stack_configuration (87-91)

src/app/routers.py (1)

tests/unit/app/test_routers.py (1)

include_router (37-52)

src/app/endpoints/query.py (2)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/app/endpoints/query_v2.py (2)

retrieve_response (304-427)

get_topic_summary (235-274)

🪛 GitHub Actions: Pydocstyle

src/app/endpoints/streaming_query.py

[error] 868-868: pydocstyle D401: First line should be in imperative mood; try rephrasing (found 'Base') in public function 'streaming_query_endpoint_handler_base'.

🪛 GitHub Actions: Pyright

src/app/endpoints/streaming_query_v2.py

[error] 456-456: pyright: No overloads for 'create' match the provided arguments (reportCallIssue) in streaming_query_v2.py:456:22. Step: 'uv run pyright src'.

[error] 460-460: pyright: Argument of type 'str | None' cannot be assigned to parameter 'previous_response_id' of type 'str | NotGiven' in function 'create' in streaming_query_v2.py:460:30. Step: 'uv run pyright src'.

🪛 GitHub Actions: Python linter

src/app/endpoints/streaming_query.py

[error] 699-699: pylint: R0917 Too many positional arguments (9/5) (too-many-positional-arguments)

[error] 860-860: pylint: R0917 Too many positional arguments (6/5) (too-many-positional-arguments)

src/app/endpoints/streaming_query_v2.py

[error] 103-103: pylint: R0917 Too many positional arguments (9/5) (too-many-positional-arguments)

[error] 137-137: pylint: R0912 Too many branches (17/12) (too-many-branches)

[error] 137-137: pylint: R0915 Too many statements (75/50) (too-many-statements)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build-pr
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: e2e_tests (azure)
GitHub Check: e2e_tests (ci)

🔇 Additional comments (1)

src/app/endpoints/streaming_query_v2.py (1)

368-397: LGTM!

The endpoint handler correctly follows the FastAPI pattern with proper authorization, dependency injection, and clean delegation to the base handler. The wrapper pattern enables code reuse while keeping v2-specific concerns isolated.

src/app/endpoints/streaming_query_v2.py

src/app/endpoints/streaming_query.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/app/endpoints/streaming_query.py (1)
700-702: Specify the return type more precisely.

The return type Any is overly broad. Consider using a more specific type annotation for better type safety:
from collections.abc import Callable
from typing import AsyncIterator

def create_agent_response_generator(
    context: ResponseGeneratorContext,
) -> Callable[[AsyncIterator[AgentTurnResponseStreamChunk]], AsyncIterator[str]]:
This clearly documents that the factory returns a callable taking an async iterator of chunks and yielding SSE strings.
src/app/endpoints/streaming_query_v2.py (1)
104-118: Specify the return type more precisely.

Like the v1 factory, the return type Any should be more specific for better type safety:
def create_responses_response_generator(
    context: ResponseGeneratorContext,
) -> Callable[[AsyncIterator[OpenAIResponseObjectStream]], AsyncIterator[str]]:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68dc335 and 5c289ed.

📒 Files selected for processing (3)

src/app/endpoints/streaming_query.py (6 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/models/context.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py
src/models/context.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py
src/models/context.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py

src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

src/models/context.py

src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

src/models/context.py

🧬 Code graph analysis (3)

src/app/endpoints/streaming_query_v2.py (12)

src/app/database.py (1)

get_session (34-40)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

persist_user_conversation_details (107-139)

validate_attachments_metadata (801-830)

src/app/endpoints/query_v2.py (4)

extract_token_usage_from_responses_api (451-541)

get_topic_summary (235-274)

prepare_tools_for_responses_api (617-668)

retrieve_response (304-427)

src/app/endpoints/streaming_query.py (6)

format_stream_data (126-137)

stream_end_event (164-220)

stream_start_event (140-161)

streaming_query_endpoint_handler_base (851-983)

response_generator (716-846)

retrieve_response (1018-1139)

src/models/cache_entry.py (1)

CacheEntry (7-24)

src/models/database/conversations.py (1)

UserConversation (11-38)

src/models/requests.py (1)

QueryRequest (73-225)

src/utils/endpoints.py (3)

create_referenced_documents_with_metadata (563-577)

get_system_prompt (126-190)

store_conversation_into_cache (231-251)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/transcripts.py (1)

store_transcript (40-99)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

src/app/endpoints/streaming_query.py (6)

src/models/context.py (1)

ResponseGeneratorContext (12-48)

src/app/endpoints/streaming_query_v2.py (1)

response_generator (120-348)

src/utils/types.py (2)

TurnSummary (89-163)

append_tool_calls_from_llama (96-117)

src/utils/endpoints.py (4)

get_system_prompt (126-190)

create_rag_chunks_dict (383-396)

create_referenced_documents_with_metadata (563-577)

store_conversation_into_cache (231-251)

src/utils/token_counter.py (2)

extract_token_usage_from_turn (44-94)

TokenCounter (18-41)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

get_topic_summary (184-214)

persist_user_conversation_details (107-139)

src/models/context.py (1)

src/models/requests.py (1)

QueryRequest (73-225)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: shellcheck
GitHub Check: build-pr
GitHub Check: e2e_tests (azure)
GitHub Check: e2e_tests (ci)

🔇 Additional comments (7)

src/models/context.py (1)

1-48: LGTM! Well-structured context dataclass.

The ResponseGeneratorContext dataclass effectively reduces function parameter counts from 9+ to 1, improving maintainability. The implementation follows all coding guidelines with proper module/class docstrings, complete type annotations, and logical field grouping.

src/app/endpoints/streaming_query.py (2)

851-983: LGTM! Clean dependency injection pattern.

The base handler effectively extracts common streaming logic while accepting API-specific functions via retrieve_response_func and create_response_generator_func. The ResponseGeneratorContext is properly constructed and passed to the generator factory, enabling v1/v2 code reuse.

986-1015: LGTM! Proper v1 endpoint wrapper.

The wrapper cleanly delegates to the base handler while providing Agent API-specific implementations (retrieve_response and create_agent_response_generator). This maintains backward compatibility for the v1 /streaming_query endpoint.

src/app/endpoints/streaming_query_v2.py (4)

1-56: LGTM! Proper module setup.

Module follows coding guidelines with descriptive docstring, proper logger initialization, and absolute imports.

273-349: LGTM! Post-streaming logic correctly implemented.

The finalization sequence properly:

Extracts token usage from the Responses API response object

Yields end event with metadata

Stores transcripts conditionally

Generates topic summaries only for new conversations

Persists cache and conversation details

The TODO comments at lines 296-297 about RAG chunks align with the PR description noting that referenced documents support will come in a follow-up.

455-457: LGTM! Proper handling of delayed conversation ID.

Returning an empty string for conversation_id is correct for the Responses API streaming flow, where the ID arrives in the first response.created chunk (lines 158-163). The comment clearly documents this behavior.

441-452: ****

The concern about passing None for tools is unfounded. Test assertions in the codebase (explicitly verify that tools=None is passed to the API and is accepted). The current code—unconditionally including "tools": toolgroups—is correct and tested. No conditional logic is needed.

Likely an incorrect or invalid review comment.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/app/endpoints/query_v2.py (1)
617-668: Consider adding error handling for vector store retrieval.

The function is well-structured and follows coding guidelines. However, the call to client.vector_stores.list() at line 647 could potentially fail. Consider wrapping it in a try-except block to handle transient errors gracefully, falling back to no RAG tools if the vector store service is unavailable.

Example:
     toolgroups = []
     # Get vector stores for RAG tools
-    vector_store_ids = [
-        vector_store.id for vector_store in (await client.vector_stores.list()).data
-    ]
+    try:
+        vector_store_ids = [
+            vector_store.id for vector_store in (await client.vector_stores.list()).data
+        ]
+    except Exception as e:
+        logger.warning("Failed to retrieve vector stores: %s", e)
+        vector_store_ids = []

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c289ed and dc74af7.

📒 Files selected for processing (8)

src/app/endpoints/query_v2.py (3 hunks)
src/app/endpoints/streaming_query.py (6 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (2 hunks)
src/models/context.py (1 hunks)
tests/unit/app/endpoints/test_query_v2.py (1 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
tests/unit/app/test_routers.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/app/routers.py

🧰 Additional context used

📓 Path-based instructions (9)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
src/models/context.py
src/app/endpoints/streaming_query.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
src/app/endpoints/streaming_query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/streaming_query_v2.py
tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/endpoints/test_streaming_query_v2.py
src/app/endpoints/query_v2.py
src/models/context.py
src/app/endpoints/streaming_query.py
tests/unit/app/test_routers.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
src/app/endpoints/streaming_query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query_v2.py
src/app/endpoints/streaming_query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/unit/app/test_routers.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/unit/app/test_routers.py

src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

src/models/context.py

src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

src/models/context.py

🧬 Code graph analysis (6)

src/app/endpoints/streaming_query_v2.py (9)

src/app/database.py (1)

get_session (34-40)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

persist_user_conversation_details (107-139)

validate_attachments_metadata (801-830)

src/app/endpoints/query_v2.py (4)

extract_token_usage_from_responses_api (451-541)

get_topic_summary (235-274)

prepare_tools_for_responses_api (617-668)

retrieve_response (304-427)

src/app/endpoints/streaming_query.py (6)

format_stream_data (126-137)

stream_end_event (164-220)

stream_start_event (140-161)

streaming_query_endpoint_handler_base (851-983)

response_generator (716-846)

retrieve_response (1018-1139)

src/models/cache_entry.py (1)

CacheEntry (7-24)

src/models/context.py (1)

ResponseGeneratorContext (12-48)

src/models/database/conversations.py (1)

UserConversation (11-38)

src/utils/endpoints.py (2)

create_referenced_documents_with_metadata (563-577)

get_system_prompt (126-190)

src/utils/transcripts.py (1)

store_transcript (40-99)

tests/unit/app/endpoints/test_query_v2.py (2)

src/models/config.py (1)

ModelContextProtocolServer (169-174)

src/app/endpoints/query_v2.py (1)

get_mcp_tools (575-614)

tests/unit/app/endpoints/test_streaming_query_v2.py (3)

src/models/requests.py (1)

QueryRequest (73-225)

src/models/config.py (3)

config (140-146)

Action (329-375)

ModelContextProtocolServer (169-174)

src/configuration.py (1)

mcp_servers (101-105)

src/app/endpoints/query_v2.py (2)

src/configuration.py (3)

configuration (73-77)

AppConfig (39-181)

mcp_servers (101-105)

src/models/requests.py (1)

QueryRequest (73-225)

src/models/context.py (1)

src/models/requests.py (1)

QueryRequest (73-225)

src/app/endpoints/streaming_query.py (2)

src/models/context.py (1)

ResponseGeneratorContext (12-48)

src/app/endpoints/streaming_query_v2.py (2)

response_generator (120-348)

retrieve_response (385-457)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)
GitHub Check: e2e_tests (azure)
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🔇 Additional comments (18)

tests/unit/app/test_routers.py (1)

23-23: LGTM! Router registration updates are correct.

The test updates correctly account for the new v2 streaming router: import added, router count incremented to 17, existence assertion added, and prefix verification included.

Also applies to: 69-69, 79-79, 95-95, 105-105

src/app/endpoints/query_v2.py (3)

21-21: LGTM! Import required for new function.

The AppConfig import is needed for the prepare_tools_for_responses_api function signature.

353-355: LGTM! Good refactoring for code reuse.

Extracting tool preparation into prepare_tools_for_responses_api centralizes the logic for both query and streaming v2 endpoints, and correctly passes mcp_headers to enable per-server authentication.

601-614: LGTM! Header merging logic is correct and flexible.

The header construction properly merges token-based authentication with per-server custom headers, allowing server-specific authorization overrides when needed. This addresses the previous review concern about dropped MCP headers.

tests/unit/app/endpoints/test_query_v2.py (1)

60-96: LGTM! Comprehensive test coverage for header merging.

The test thoroughly validates the three key scenarios: per-server headers without token, merged token + per-server headers, and Authorization header override. This ensures the header merging logic in get_mcp_tools works correctly.

src/models/context.py (1)

1-48: LGTM! Well-designed context object.

The ResponseGeneratorContext dataclass is well-structured with complete docstrings and type hints following coding guidelines. It effectively addresses the previous concern about too many positional arguments by bundling related parameters logically. The grouping of fields (conversation/user context, model/provider info, request/timing, dependencies/state) makes the code more maintainable.

tests/unit/app/endpoints/test_streaming_query_v2.py (4)

20-26: LGTM! Well-structured test fixture.

The dummy_request fixture properly sets up a mock FastAPI Request with authorized actions, which is necessary for RBAC checks in the endpoint handler.

29-56: LGTM! Tests validate core tool preparation logic.

Both tests properly verify that retrieve_response builds the correct tools for the Responses API: test_retrieve_response_builds_rag_and_mcp_tools confirms RAG and MCP tools are assembled, and test_retrieve_response_no_tools_passes_none ensures the no_tools flag is respected.

Also applies to: 59-80

83-223: LGTM! Comprehensive integration test for streaming flow.

This test validates the complete streaming lifecycle including SSE event generation, conversation persistence, and metric updates. The fake stream covers all major event types (response.created, output_text.delta, function_call, etc.), and assertions verify both event content and ordering.

226-250: LGTM! Error handling test is appropriate.

The test correctly verifies that API connection errors are caught, logged, return HTTP 500, and increment the failure metric.

src/app/endpoints/streaming_query.py (4)

8-8: LGTM! Necessary imports for refactoring.

The Callable import supports the factory function parameters in the base handler, and ResponseGeneratorContext enables the parameter reduction pattern.

Also applies to: 50-50

700-848: LGTM! Factory function properly encapsulates Agent API streaming logic.

The create_agent_response_generator function effectively bundles the streaming response generation logic, using the ResponseGeneratorContext to access all necessary parameters. The inner generator handles SSE formatting, transcript storage, topic summarization, caching, and persistence correctly.

851-983: LGTM! Base handler provides good separation of concerns.

The streaming_query_endpoint_handler_base extracts all common streaming logic (RBAC, conversation ownership validation, model selection, error handling) while accepting injectable functions for API-specific behavior. This enables code reuse between v1 (Agent API) and v2 (Responses API) endpoints.

986-1015: LGTM! Wrapper delegates to base with Agent API specifics.

The streaming_query_endpoint_handler function is a thin wrapper that supplies Agent API-specific implementations (retrieve_response and create_agent_response_generator) to the base handler, maintaining backward compatibility for the v1 endpoint.

src/app/endpoints/streaming_query_v2.py (4)

1-56: LGTM! Module setup follows coding guidelines.

The module has a descriptive docstring, proper imports, and appropriate configuration. The router is correctly tagged with "streaming_query_v2" and response schemas are well-defined.

104-350: LGTM! Response generator correctly handles Responses API events.

The create_responses_response_generator factory properly processes all Responses API event types:

response.created → start event with conversation ID

response.output_text.delta → token events

response.function_call_arguments.delta → tool_call events

response.completed → turn_complete event

Post-streaming: token usage extraction, transcript storage, topic summary, caching, and persistence

The event handling logic is comprehensive and correctly accumulates state across chunks.

353-382: LGTM! Endpoint wrapper follows established pattern.

The streaming_query_endpoint_handler_v2 function correctly delegates to the base handler with v2-specific implementations (retrieve_response and create_responses_response_generator), maintaining consistency with the v1 endpoint structure.

385-457: LGTM! Retrieve function properly prepares Responses API call.

The retrieve_response function correctly:

Validates attachments

Prepares tools via prepare_tools_for_responses_api (with mcp_headers)

Concatenates attachment content into input text

Conditionally builds create parameters to avoid type errors

Initiates streaming response with stream=True

The conversation ID is returned as empty string since it arrives in the first chunk, which is handled correctly by the generator.

tisnik

Seems legit, but would be nice to have @eranco74 ack too. TYVM

src/app/endpoints/streaming_query_v2.py

eranco74 · 2025-11-06T08:52:16Z

src/app/endpoints/streaming_query_v2.py

+        emitted_turn_complete = False
+
+        # Handle conversation id and start event in-band on response.created
+        conv_id = context.conversation_id


This is always an empty string, right?

right. but I prefer to leave it there as it depends on the retrieve_response implementation (in case it changes in the future). What do you think?

eranco74 · 2025-11-06T08:53:06Z

src/app/endpoints/streaming_query_v2.py

+        with get_session() as session:
+            existing_conversation = (
+                session.query(UserConversation).filter_by(id=conv_id).first()
+            )


This looks like synchronous code in an async function

This is the same in streaming_query.py

eranco74 · 2025-11-06T08:58:34Z

src/app/endpoints/streaming_query_v2.py

+        )
+
+        yield stream_end_event(context.metadata_map, summary, token_usage, media_type)
+


I think you can/should split this logic into two parts the streaming part and the rest (store_transctipt, summary, cache update)
All the database and cache work can happen after the stream is finished.

Generator function: Its only job is to yield SSE events and return the final summary and conv_id.
Cleanup function: A new async function that takes the summary and context and does all the database/cache work.
Also seems that both streaming_query and streaming_query_v2 can use the same Cleanup function

I half addressed that. Made the cleanup function to avoid code duplication for streaming_query and streaming_query_v2, but did not split it out of the generator as that seems a bigger change and was already there before this PR in streaming_query

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1b6f9c and 9cc8028.

📒 Files selected for processing (8)

src/app/endpoints/streaming_query.py (6 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/constants.py (1 hunks)
src/models/context.py (1 hunks)
src/utils/endpoints.py (2 hunks)
tests/unit/app/endpoints/test_query_v2.py (12 hunks)
tests/unit/app/endpoints/test_streaming_query.py (4 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/models/context.py
tests/unit/app/endpoints/test_streaming_query_v2.py

🧰 Additional context used

📓 Path-based instructions (8)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

tests/unit/app/endpoints/test_query_v2.py
src/utils/endpoints.py
tests/unit/app/endpoints/test_streaming_query.py
src/app/endpoints/streaming_query.py
src/constants.py
src/app/endpoints/streaming_query_v2.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/endpoints/test_streaming_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/endpoints/test_streaming_query.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/utils/endpoints.py
src/app/endpoints/streaming_query.py
src/constants.py
src/app/endpoints/streaming_query_v2.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py

src/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep shared constants in a central src/constants.py with descriptive comments

Files:

src/constants.py

🧠 Learnings (4)

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/**/*.py : Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Applied to files:

tests/unit/app/endpoints/test_query_v2.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Handle Llama Stack APIConnectionError when interacting with the Llama client

Applied to files:

tests/unit/app/endpoints/test_query_v2.py
src/utils/endpoints.py
src/constants.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

src/utils/endpoints.py
src/constants.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/app/endpoints/**/*.py : In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Applied to files:

src/utils/endpoints.py

🧬 Code graph analysis (5)

tests/unit/app/endpoints/test_query_v2.py (3)

src/models/config.py (1)

ModelContextProtocolServer (169-174)

src/app/endpoints/query_v2.py (1)

get_mcp_tools (575-614)

tests/unit/app/endpoints/test_streaming_query_v2.py (2)

dummy_request (23-28)

_raise (207-208)

src/utils/endpoints.py (6)

src/models/requests.py (1)

QueryRequest (73-225)

src/utils/types.py (1)

TurnSummary (89-163)

src/configuration.py (1)

AppConfig (39-181)

src/app/database.py (1)

get_session (34-40)

src/models/database/conversations.py (1)

UserConversation (11-38)

src/models/cache_entry.py (1)

CacheEntry (7-24)

tests/unit/app/endpoints/test_streaming_query.py (2)

src/utils/types.py (1)

TurnSummary (89-163)

tests/unit/app/endpoints/test_query.py (1)

mock_metrics (71-80)

src/app/endpoints/streaming_query.py (4)

src/models/context.py (1)

ResponseGeneratorContext (12-48)

src/utils/endpoints.py (2)

cleanup_after_streaming (598-711)

get_system_prompt (127-191)

src/utils/types.py (2)

TurnSummary (89-163)

append_tool_calls_from_llama (96-117)

src/utils/token_counter.py (2)

extract_token_usage_from_turn (44-94)

TokenCounter (18-41)

src/app/endpoints/streaming_query_v2.py (12)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

persist_user_conversation_details (107-139)

validate_attachments_metadata (801-830)

src/app/endpoints/query_v2.py (4)

extract_token_usage_from_responses_api (451-541)

get_topic_summary (235-274)

prepare_tools_for_responses_api (617-668)

retrieve_response (304-427)

src/app/endpoints/streaming_query.py (6)

format_stream_data (123-134)

stream_end_event (161-217)

stream_start_event (137-158)

streaming_query_endpoint_handler_base (801-933)

response_generator (713-796)

retrieve_response (968-1089)

src/authentication/__init__.py (1)

get_auth_dependency (14-52)

src/authorization/middleware.py (1)

authorize (111-122)

src/models/config.py (2)

config (140-146)

Action (329-375)

src/models/context.py (1)

ResponseGeneratorContext (12-48)

src/models/requests.py (1)

QueryRequest (73-225)

src/utils/endpoints.py (2)

cleanup_after_streaming (598-711)

get_system_prompt (127-191)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/transcripts.py (1)

store_transcript (40-99)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: e2e_tests (ci)
GitHub Check: e2e_tests (azure)
GitHub Check: build-pr

🔇 Additional comments (1)

src/constants.py (1)

5-5: Verify 0.3.0 breaking changes are addressed in the codebase.

The web search reveals Llama Stack 0.3.0 contains breaking API changes: Tools API now uses input_schema/output_schema instead of parameters, Chat/agent API shapes were cleaned up, and routes were renamed. These changes require codebase verification beyond the version constant update.

Additionally, hardcoded version "0.2.22" appears in two locations—tests/integration/endpoints/test_info_integration.py (lines 30, 69) and src/models/responses.py (lines 321, 338)—and should be reviewed for consistency with the new maximum version. Confirm the v2 streaming feature integrates properly with 0.3.0's API changes and that test coverage validates compatibility.

src/app/endpoints/streaming_query_v2.py

coderabbitai · 2025-11-06T13:05:02Z

src/app/endpoints/streaming_query_v2.py

+    create_params: dict[str, Any] = {
+        "input": input_text,
+        "model": model_id,
+        "instructions": system_prompt,
+        "stream": True,
+        "store": True,
+        "tools": toolgroups,
+    }
+    if query_request.conversation_id:
+        create_params["previous_response_id"] = query_request.conversation_id
+


⚠️ Potential issue | 🟠 Major

Only send tools when we actually have any

We always add tools=None to the payload. The Responses API expects the field to be omitted unless we supply a real list; sending null can trigger validation errors. Gate the assignment so we only include the key when toolgroups is truthy.

- create_params: dict[str, Any] = { + create_params: dict[str, Any] = { "input": input_text, "model": model_id, "instructions": system_prompt, "stream": True, "store": True, - "tools": toolgroups, } if query_request.conversation_id: create_params["previous_response_id"] = query_request.conversation_id + if toolgroups: + create_params["tools"] = toolgroups

🤖 Prompt for AI Agents

In src/app/endpoints/streaming_query_v2.py around lines 391 to 401, the payload always includes "tools": toolgroups (which becomes null when empty) causing validation errors; change the construction so the "tools" key is only added to create_params when toolgroups is truthy (e.g., if toolgroups: create_params["tools"] = toolgroups) so the field is omitted when there are no tools.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/utils/endpoints.py (2)

400-416: Fix AnyUrl validation to avoid runtime errors (pydantic v2).

Directly calling AnyUrl(...) is not a supported validation path in pydantic v2 and can raise TypeError; this code path is now exercised by cleanup_after_streaming. Use a TypeAdapter to validate, and broaden exception handling.

Apply:

@@
-from pydantic import AnyUrl, ValidationError
+from pydantic import AnyUrl, ValidationError, TypeAdapter
@@
 logger = get_logger(__name__)
+
+# Adapter for validating URLs in pydantic v2
+_ANYURL = TypeAdapter(AnyUrl)
@@
-            validated_url = AnyUrl(src)
-        except ValidationError:
+            validated_url = _ANYURL.validate_python(src)
+        except (ValidationError, ValueError, TypeError):
             logger.warning("Invalid URL in chunk source: %s", src)
             validated_url = None
@@
-        validated_doc_url = None
-        if doc_url and doc_url.startswith("http"):
-            validated_doc_url = AnyUrl(doc_url)
-    except ValidationError:
+        validated_doc_url = None
+        if doc_url and doc_url.startswith("http"):
+            validated_doc_url = _ANYURL.validate_python(doc_url)
+    except (ValidationError, ValueError, TypeError):
         logger.warning("Invalid URL in metadata: %s", doc_url)
         validated_doc_url = None
@@
-                validated_url = None
-                if doc_url.startswith("http"):
-                    validated_url = AnyUrl(doc_url)
-            except ValidationError:
+                validated_url = None
+                if doc_url.startswith("http"):
+                    validated_url = _ANYURL.validate_python(doc_url)
+            except (ValidationError, ValueError, TypeError):
                 logger.warning("Invalid URL in metadata_map: %s", doc_url)
                 validated_url = None

Also applies to: 443-451, 472-480

356-367: Docstring return description mismatches actual return.

Function returns (agent, session_id, conversation_id). Clarify to prevent misuse.

-    Returns:
-        tuple[AsyncAgent, str]: A tuple containing the agent and session_id.
+    Returns:
+        tuple[AsyncAgent, str, str]: (agent, session_id, conversation_id).

Also applies to: 378-381

♻️ Duplicate comments (1)

src/app/endpoints/streaming_query_v2.py (1)

391-401: Only include “tools” when present to satisfy Responses API type contract.

Passing tools=None can be invalid; gate the key. (Previously flagged.)

-    create_params: dict[str, Any] = {
+    create_params: dict[str, Any] = {
         "input": input_text,
         "model": model_id,
         "instructions": system_prompt,
         "stream": True,
         "store": True,
-        "tools": toolgroups,
     }
     if query_request.conversation_id:
         create_params["previous_response_id"] = query_request.conversation_id
+    if toolgroups:
+        create_params["tools"] = toolgroups

🧹 Nitpick comments (6)

src/utils/endpoints.py (2)
598-616: Add precise Callable type hints for injected helpers in cleanup_after_streaming.

Improves readability and static checking; avoids Any.
@@
-from typing import Any
+from typing import Any, Callable, Awaitable
@@
 async def cleanup_after_streaming(
@@
-    get_topic_summary_func: Any,
-    is_transcripts_enabled_func: Any,
-    store_transcript_func: Any,
-    persist_user_conversation_details_func: Any,
+    get_topic_summary_func: Callable[[str, AsyncLlamaStackClient, str], Awaitable[str]],
+    is_transcripts_enabled_func: Callable[[], bool],
+    store_transcript_func: Callable[..., None],
+    persist_user_conversation_details_func: Callable[
+        [str, str, str, str, str | None], None
+    ],
Also applies to: 5-5

7-7: Use public client import to avoid private API coupling.

Prefer from llama_stack_client import AsyncLlamaStackClient for stability. Based on learnings.
-from llama_stack_client._client import AsyncLlamaStackClient
+from llama_stack_client import AsyncLlamaStackClient
tests/unit/app/endpoints/test_query_v2.py (1)
21-26: Optional: Pre-set authorized_actions in the dummy_request fixture.

Avoids per-test mutation, reduces boilerplate.
 @pytest.fixture
 def dummy_request() -> Request:
     """Create a dummy FastAPI Request object for testing."""
     req = Request(scope={"type": "http"})
-    return req
+    req.state.authorized_actions = set()  # or set(Action) if permissive is desired
+    return req
Also applies to: 391-396
src/app/endpoints/streaming_query_v2.py (2)
211-224: Optional: Drop “role” from tool_call events for parity with Agent API JSON.

OLS JSON tests for v1 omit “role”; consider aligning v2 for consistency.
-                yield format_stream_data(
+                yield format_stream_data(
                     {
                         "event": "tool_call",
                         "data": {
                             "id": chunk_id,
-                            "role": "tool_execution",
                             "token": delta,
                         },
                     }
                 )
342-364: Doc tweak: this is the Responses API path, not agents.

Very minor clarity improvement.
-    Retrieve response from LLMs and agents.
+    Retrieve a streaming response using the Responses API.
@@
-    ID from the Llama Stack agent for a given user query.
+    ID from the Responses API for a given user query.
src/app/endpoints/streaming_query.py (1)
727-735: Optional: Initialize summary with empty string for consistency.

Match the v2 fallback behavior and avoid non-empty default.
-        summary = TurnSummary(llm_response="No response from the model", tool_calls=[])
+        summary = TurnSummary(llm_response="", tool_calls=[])

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cc8028 and 09bec86.

📒 Files selected for processing (7)

src/app/endpoints/streaming_query.py (6 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/models/context.py (1 hunks)
src/utils/endpoints.py (2 hunks)
tests/unit/app/endpoints/test_query_v2.py (12 hunks)
tests/unit/app/endpoints/test_streaming_query.py (4 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/unit/app/endpoints/test_streaming_query_v2.py
src/models/context.py

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/utils/endpoints.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/utils/endpoints.py
src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/app/endpoints/test_query_v2.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/streaming_query_v2.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_streaming_query.py
tests/unit/app/endpoints/test_query_v2.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_streaming_query.py
tests/unit/app/endpoints/test_query_v2.py

🧠 Learnings (4)

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

src/utils/endpoints.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/app/endpoints/**/*.py : In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Applied to files:

src/utils/endpoints.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Handle Llama Stack APIConnectionError when interacting with the Llama client

Applied to files:

src/utils/endpoints.py
tests/unit/app/endpoints/test_query_v2.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/**/*.py : Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Applied to files:

tests/unit/app/endpoints/test_query_v2.py

🧬 Code graph analysis (5)

src/utils/endpoints.py (6)

src/models/requests.py (1)

QueryRequest (73-225)

src/utils/types.py (1)

TurnSummary (89-163)

src/configuration.py (1)

AppConfig (39-181)

src/app/database.py (1)

get_session (34-40)

src/models/database/conversations.py (1)

UserConversation (11-38)

src/models/cache_entry.py (1)

CacheEntry (7-24)

src/app/endpoints/streaming_query.py (3)

src/models/context.py (1)

ResponseGeneratorContext (12-48)

src/utils/endpoints.py (1)

cleanup_after_streaming (598-711)

src/app/endpoints/streaming_query_v2.py (1)

response_generator (114-298)

src/app/endpoints/streaming_query_v2.py (8)

src/app/endpoints/query.py (3)

is_transcripts_enabled (98-104)

persist_user_conversation_details (107-139)

validate_attachments_metadata (801-830)

src/app/endpoints/query_v2.py (4)

extract_token_usage_from_responses_api (451-541)

get_topic_summary (235-274)

prepare_tools_for_responses_api (617-668)

retrieve_response (304-427)

src/app/endpoints/streaming_query.py (6)

format_stream_data (123-134)

stream_end_event (161-217)

stream_start_event (137-158)

streaming_query_endpoint_handler_base (801-933)

response_generator (713-796)

retrieve_response (968-1089)

src/utils/endpoints.py (2)

cleanup_after_streaming (598-711)

get_system_prompt (127-191)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/transcripts.py (1)

store_transcript (40-99)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

tests/unit/app/endpoints/test_streaming_query.py (2)

src/utils/types.py (1)

TurnSummary (89-163)

tests/unit/app/endpoints/test_query.py (1)

mock_metrics (71-80)

tests/unit/app/endpoints/test_query_v2.py (3)

src/models/config.py (1)

ModelContextProtocolServer (169-174)

src/app/endpoints/query_v2.py (1)

get_mcp_tools (575-614)

tests/unit/app/endpoints/test_streaming_query_v2.py (3)

test_retrieve_response_builds_rag_and_mcp_tools (32-60)

dummy_request (23-28)

_raise (207-208)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: build-pr
GitHub Check: e2e_tests (azure)
GitHub Check: e2e_tests (ci)

🔇 Additional comments (3)

tests/unit/app/endpoints/test_query_v2.py (1)

28-59: Good coverage for v2 tools, attachments, usage, and error paths.

Tests clearly assert tool merging (token + per-server), no-tools bypass, attachment inclusion, both usage shapes, and API connection errors. LGTM.

Also applies to: 61-97, 100-133, 135-178, 181-233, 236-274, 277-311, 314-348, 351-388, 391-441, 444-470

tests/unit/app/endpoints/test_streaming_query.py (1)

58-59: Tests updated to the centralized cleanup and generator flow look solid.

Patching cleanup_after_streaming avoids duplicating persistence logic.

Core streaming assertions and OLS-compat tests remain intact.

Also applies to: 76-80, 215-359, 362-368, 391-399, 601-629

src/app/endpoints/streaming_query.py (1)

801-907: Base handler refactor with context + factories is clean and extensible.

The abstraction reduces duplication between v1/v2 and keeps endpoint logic lean. LGTM.

Also applies to: 936-966

src/app/endpoints/streaming_query_v2.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/app/endpoints/streaming_query_v2.py (1)
395-404: Do not send tools=None to the Responses API.

Line 400 always injects "tools": toolgroups, so when no tools are available we emit tools: null. The Responses API expects the field to be omitted unless we have an actual list; sending null trips validation and breaks streaming (this was already called out earlier). Please add the key only when toolgroups is truthy.
-    create_params: dict[str, Any] = {
-        "input": input_text,
-        "model": model_id,
-        "instructions": system_prompt,
-        "stream": True,
-        "store": True,
-        "tools": toolgroups,
-    }
+    create_params: dict[str, Any] = {
+        "input": input_text,
+        "model": model_id,
+        "instructions": system_prompt,
+        "stream": True,
+        "store": True,
+    }
     if query_request.conversation_id:
         create_params["previous_response_id"] = query_request.conversation_id
+    if toolgroups:
+        create_params["tools"] = toolgroups

🧹 Nitpick comments (1)

src/app/endpoints/streaming_query.py (1)
801-934: Consider keyword-only parameters to reduce pylint suppressions.

The base handler properly implements dependency injection for Agent/Responses API flexibility. However, with 6 positional parameters, you're triggering pylint R0913. Consider making some parameters keyword-only by adding * before retrieve_response_func:
 async def streaming_query_endpoint_handler_base(
     request: Request,
     query_request: QueryRequest,
     auth: AuthTuple,
     mcp_headers: dict[str, dict[str, str]],
+    *,
     retrieve_response_func: Callable[..., Any],
     create_response_generator_func: Callable[..., Any],
 ) -> StreamingResponse:
This would eliminate the need for the too-many-positional-arguments disable while maintaining the same functionality.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09bec86 and f3c324e.

📒 Files selected for processing (7)

src/app/endpoints/streaming_query.py (6 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/models/context.py (1 hunks)
src/utils/endpoints.py (2 hunks)
tests/unit/app/endpoints/test_query_v2.py (12 hunks)
tests/unit/app/endpoints/test_streaming_query.py (4 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/models/context.py

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/streaming_query_v2.py
src/utils/endpoints.py
src/app/endpoints/streaming_query.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/streaming_query_v2.py
src/utils/endpoints.py
tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/unit/app/endpoints/test_streaming_query.py
src/app/endpoints/streaming_query.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/unit/app/endpoints/test_streaming_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/unit/app/endpoints/test_streaming_query.py

🧠 Learnings (4)

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

src/utils/endpoints.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/app/endpoints/**/*.py : In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Applied to files:

src/utils/endpoints.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Handle Llama Stack APIConnectionError when interacting with the Llama client

Applied to files:

src/utils/endpoints.py
tests/unit/app/endpoints/test_query_v2.py

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/**/*.py : Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Applied to files:

tests/unit/app/endpoints/test_query_v2.py

🔇 Additional comments (3)

src/app/endpoints/streaming_query.py (3)

8-8: LGTM: Import additions support the refactoring.

The new imports (Callable, ResponseGeneratorContext, cleanup_after_streaming) are properly utilized in the factory-based streaming pattern and successfully address the previous pylint R0917 issue.

Also applies to: 48-48, 54-54

697-799: LGTM: Factory function successfully addresses previous R0917 issue.

The refactoring bundles parameters into ResponseGeneratorContext and properly implements the streaming response generation. The inner generator correctly handles the full lifecycle: start event, chunk processing, token usage extraction, and cleanup operations.

936-966: LGTM: Clean delegation to base handler.

The endpoint handler properly wraps the base handler with Agent API-specific factories (retrieve_response, create_agent_response_generator), maintaining backward compatibility while enabling the v1/v2 pattern.

coderabbitai · 2025-11-06T14:12:25Z

tests/unit/app/endpoints/test_streaming_query_v2.py

+    kwargs = mock_client.responses.create.call_args.kwargs
+    assert kwargs["tools"] is None
+    assert kwargs["stream"] is True


⚠️ Potential issue | 🟠 Major

Adjust assertion: tools key should be absent when no tools.

Line 83 currently enforces kwargs["tools"] is None, so the test locks in passing tools=null to the Responses API. Once we drop that key (required to avoid API validation failures), this assertion should instead verify that the parameter is omitted entirely.

- assert kwargs["tools"] is None + assert "tools" not in kwargs

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

kwargs = mock_client.responses.create.call_args.kwargs

assert kwargs["tools"] is None

assert kwargs["stream"] is True

kwargs = mock_client.responses.create.call_args.kwargs

assert "tools" not in kwargs

assert kwargs["stream"] is True

🤖 Prompt for AI Agents

In tests/unit/app/endpoints/test_streaming_query_v2.py around lines 82 to 84, the test currently asserts kwargs["tools"] is None which assumes the Responses API receives a tools key; update the assertion to verify the key is absent by asserting "tools" not in kwargs (and keep the existing assert kwargs["stream"] is True), so the test expects the parameter to be omitted rather than present with a null value.

luis5tb mentioned this pull request Nov 4, 2025

Responses v2 support #540

Closed

18 tasks

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

src/app/endpoints/query_v2.py Outdated Show resolved Hide resolved

src/app/endpoints/streaming_query_v2.py Show resolved Hide resolved

src/app/endpoints/streaming_query_v2.py Show resolved Hide resolved

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

src/app/endpoints/query_v2.py Show resolved Hide resolved

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

luis5tb force-pushed the responses_v2_streaming branch from dd1cee0 to 68dc335 Compare November 4, 2025 18:29

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

src/app/endpoints/streaming_query_v2.py Outdated Show resolved Hide resolved

src/app/endpoints/streaming_query.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

Base implementation of streaming_query for Responses API (v2)

7e2c1b7

luis5tb force-pushed the responses_v2_streaming branch from 5c289ed to dc74af7 Compare November 5, 2025 11:35

luis5tb changed the title ~~[WIP] Base implementation of streaming Responses API~~ Base implementation of streaming Responses API Nov 5, 2025

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

luis5tb force-pushed the responses_v2_streaming branch from dc74af7 to a1b6f9c Compare November 5, 2025 11:42

tisnik approved these changes Nov 5, 2025

View reviewed changes

luis5tb mentioned this pull request Nov 5, 2025

[WIP] Responses v2 shields #760

Open

18 tasks

eranco74 reviewed Nov 6, 2025

View reviewed changes

luis5tb force-pushed the responses_v2_streaming branch from a1b6f9c to 9cc8028 Compare November 6, 2025 12:58

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

luis5tb force-pushed the responses_v2_streaming branch from 9cc8028 to 09bec86 Compare November 6, 2025 13:05

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

src/app/endpoints/streaming_query_v2.py Outdated Show resolved Hide resolved

Fixes to address PR reviews

f3c324e

luis5tb force-pushed the responses_v2_streaming branch from 09bec86 to f3c324e Compare November 6, 2025 14:03

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

luis5tb requested a review from eranco74 November 7, 2025 06:13

		)

		yield stream_end_event(context.metadata_map, summary, token_usage, media_type)

Base implementation of streaming Responses API #754

Are you sure you want to change the base?

Base implementation of streaming Responses API #754

Uh oh!

Conversation

luis5tb commented Nov 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eranco74 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

luis5tb Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

eranco74 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

luis5tb Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

eranco74 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

luis5tb Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

luis5tb commented Nov 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 4, 2025 •

edited

Loading