LCORE-471: Remove data collector service #377

onmete · 2025-08-11T11:28:28Z

Description

Complete removal of the data collector service and all its references from the codebase. The data collector functionality has been separated into its own service/repository.

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Added GET /v1/conversations to list a user’s conversations.
- Introduced database configuration with SQLite (default) and PostgreSQL options.
- Added top-level authentication configuration.
Chores
- Removed the Data Collector service and related tooling.
Documentation
- Updated getting started, deployment guide, and output to remove Data Collector references.
- Expanded API docs, including /v1/info, /v1/config, health probes, /authorized, /metrics, and the new /v1/conversations.
- Updated OpenAPI schemas to include database and conversations models; removed Data Collector model.
Tests
- Removed Data Collector-related tests.

coderabbitai · 2025-08-11T11:28:34Z

Walkthrough

This PR removes the Data Collector feature across code, configuration, and docs; simplifies startup to always run the web service; updates configuration schemas; and expands OpenAPI with a new conversations endpoint and database configuration (sqlite/postgres). Numerous tests and documentation sections related to Data Collector are deleted.

Changes

Cohort / File(s)	Summary of changes
Runtime: Data Collector removal `src/services/data_collector.py`, `src/runners/data_collector.py`, `src/lightspeed_stack.py`, `src/constants.py`, `Makefile`, `tests/unit/runners/test_data_collector_runner.py`, `tests/unit/services/test_data_collector.py`	Deleted DataCollectorService and runner; removed CLI flag and branch; removed related constants; removed Makefile target run-data-collector; deleted associated unit tests; startup now always launches web service.
Config models update `src/models/config.py`, `docs/config.puml`	Removed DataCollectorConfiguration and UserDataCollection.data_collector; added InferenceConfiguration.check_default_model_and_provider() to diagram; updated relationships accordingly.
API/OpenAPI updates `docs/openapi.json`	Added GET /v1/conversations with ConversationsListResponse and ConversationDetails; introduced DatabaseConfiguration with SQLite/PostgreSQL variants; added Configuration.database default; removed DataCollectorConfiguration and its references; enhanced endpoint descriptions.
Docs cleanup `README.md`, `docs/openapi.md`, `docs/output.md`, `docs/testing.md`, `docs/deployment_guide.md`, `docs/getting_started.md`	Removed Data Collector sections, models, examples, and test references; replaced data_collector examples with authentication: module: "noop"; adjusted unit test structure docs.
Configuration examples `lightspeed-stack.yaml`, `.github/workflows/e2e_tests.yaml`	Removed user_data_collection.data_collector block; added top-level authentication: module: "noop"; e2e-generated config reflects removal.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as CLI
  participant App as lightspeed_stack.py
  participant Web as Uvicorn/Web Service

  CLI->>App: main(args)
  App->>App: parse args (no --data-collector)
  alt dump configuration
    App-->>CLI: write config and exit
  else start service
    App->>Web: start_uvicorn()
    Web-->>CLI: service running
  end

sequenceDiagram
  participant Client as Client
  participant API as Web Service
  participant DB as Database (sqlite/postgres)

  Client->>API: GET /v1/conversations
  API->>DB: Query user conversations
  DB-->>API: Conversations list
  API-->>Client: 200 ConversationsListResponse

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

LCORE-302: User Data Collection #237 — Introduces the Data Collector feature that this PR removes.
OLS-1946: Change double negative in user data collection config #271 — Modifies Data Collector flags and behavior later eliminated here.

Suggested reviewers

tisnik
manstis
ldjebran

Poem

A whisk of code, a hop, delete—
The collector’s gone on silent feet.
Now servers start without a fork,
And chats can list their tidy work.
SQLite burrows, Postgres peers—
New paths to store our bunny cheers. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (7)

docs/testing.md (1)
68-71: Align integration test coverage threshold in docs with Makefile

The docs in docs/testing.md (lines 68–71) state that the integration‐test coverage threshold is 60%, but the Makefile actually sets --cov-fail-under=10 for test-integration. Update one or the other to keep them in sync.

• File: docs/testing.md (around line 69)
• Makefile (line 21) sets integration threshold to 10%

Proposed doc fix:
-Currently code coverage threshold for integration tests is set to 60%. This value is specified directly in Makefile, because the coverage threshold is different from threshold required for unit tests.
+Currently code coverage threshold for integration tests is set to 10%. This value is specified directly in Makefile, because the coverage threshold is different from threshold required for unit tests.
.github/workflows/e2e_tests.yaml (1)
85-93: lightspeed-stack.yaml keys inconsistent with repository/docs (feedback_disabled vs feedback_enabled)

Other configs/docs use feedback_enabled/transcripts_enabled. Here the workflow writes feedback_disabled/transcripts_disabled, which may not be recognized by the config model.

Apply this diff to align with the rest of the repo:
             user_data_collection:
-              feedback_disabled: false
-              feedback_storage: "/tmp/data/feedback"
-              transcripts_disabled: false
-              transcripts_storage: "/tmp/data/transcripts"
+              feedback_enabled: true
+              feedback_storage: "/tmp/data/feedback"
+              transcripts_enabled: true
+              transcripts_storage: "/tmp/data/transcripts"

             authentication:
               module: "noop"
src/lightspeed_stack.py (2)
61-64: Avoid logging full configuration objects (risk of leaking secrets like API keys)

logger.info("Configuration: %s", configuration.configuration) likely prints sensitive fields (e.g., llama_stack.api_key). Reduce detail or redact.

Apply a minimal hardening:
-    logger.info("Configuration: %s", configuration.configuration)
-    logger.info(
-        "Llama stack configuration: %s", configuration.llama_stack_configuration
-    )
+    # Avoid dumping full configuration at INFO level to prevent secret leakage
+    logger.debug("Configuration loaded.")
+    logger.debug("Llama stack configuration loaded.")
If you have a redaction-safe serializer, prefer logging that instead.

66-68: Wrap client initialization with error handling

Network or config errors will currently bubble up unhandled. Fail fast with a clear message and non-zero exit.
-    asyncio.run(
-        AsyncLlamaStackClientHolder().load(configuration.configuration.llama_stack)
-    )
+    try:
+        asyncio.run(
+            AsyncLlamaStackClientHolder().load(configuration.configuration.llama_stack)
+        )
+    except Exception as exc:
+        logger.error("Failed to initialize Llama Stack client: %s", exc, exc_info=True)
+        raise SystemExit(1)
src/models/config.py (1)
241-246: Avoid assert for runtime validation in jwk_configuration property

Using assert may raise AssertionError and get optimized away. Raise a ValueError to keep error semantics consistent.
     def jwk_configuration(self) -> JwkConfiguration:
         """Return JWK configuration if the module is JWK token."""
         if self.module != constants.AUTH_MOD_JWK_TOKEN:
             raise ValueError(
                 "JWK configuration is only available for JWK token authentication module"
             )
-        assert self.jwk_config is not None, "JWK configuration should not be None"
-        return self.jwk_config
+        if self.jwk_config is None:
+            raise ValueError("JWK configuration must be provided for JWK token module")
+        return self.jwk_config
docs/config.puml (1)
12-22: Add DatabaseConfiguration and DB classes to diagram (docs out of sync with code and OpenAPI)

Configuration now includes a database field with SQLite/PostgreSQL options. The diagram should reflect:

Class DatabaseConfiguration with sqlite/postgres.

Classes SQLiteDatabaseConfiguration and PostgreSQLDatabaseConfiguration.

Composition relationships.

Proposed additions:
 class "Configuration" as src.models.config.Configuration {
   authentication : Optional[AuthenticationConfiguration]
   customization : Optional[Customization]
+  database
   inference
   llama_stack
   mcp_servers : list[ModelContextProtocolServer]
   name : str
   service
   user_data_collection
   dump(filename: str) -> None
 }
@@
+class "DatabaseConfiguration" as src.models.config.DatabaseConfiguration {
+  sqlite : Optional[SQLiteDatabaseConfiguration]
+  postgres : Optional[PostgreSQLDatabaseConfiguration]
+  check_database_configuration() -> Self
+  db_type
+  config
+}
+class "SQLiteDatabaseConfiguration" as src.models.config.SQLiteDatabaseConfiguration {
+  db_path : str
+}
+class "PostgreSQLDatabaseConfiguration" as src.models.config.PostgreSQLDatabaseConfiguration {
+  host : str
+  port : int
+  db : str
+  user : str
+  password : str
+  namespace : Optional[str]
+  ssl_mode : str
+  gss_encmode : str
+  ca_cert_path : Optional[FilePath]
+  check_postgres_configuration() -> Self
+}
@@
+src.models.config.DatabaseConfiguration --* src.models.config.Configuration : database
+src.models.config.SQLiteDatabaseConfiguration --* src.models.config.DatabaseConfiguration : sqlite
+src.models.config.PostgreSQLDatabaseConfiguration --* src.models.config.DatabaseConfiguration : postgres
Also applies to: 71-77, 79-85
docs/openapi.json (1)
41-55: Invalid OpenAPI response format – extraneous name/version fields
The spec validator fails because only description, headers, content, and links are allowed at the response object level. Move your sample fields under the media‐type’s example (or examples) property.

Please update docs/openapi.json at the following locations:

Lines 41–55 (/v1/info GET response)

Lines 67–99

Lines 124–137

Lines 235–275

Lines 388–405

Lines 446–463

Lines 518–521

Lines 555–577

Lines 599–609

Lines 665–674

Apply a patch similar to:
 "responses": {
   "200": {
     "description": "Successful Response",
     "content": {
-      "application/json": {
-        "schema": { "$ref": "#/components/schemas/InfoResponse" },
-        "name": "Service name",
-        "version": "Service version"
-      }
+      "application/json": {
+        "schema": { "$ref": "#/components/schemas/InfoResponse" },
+        "example": {
+          "name": "Service name",
+          "version": "Service version"
+        }
+      }
     }
   }
 }
After making these changes, rerun your OpenAPI validation to confirm there are no more unevaluated‐properties errors.

🧹 Nitpick comments (11)

README.md (1)

45-45: Add a short migration note for Data Collector

Consider adding a “Where did Data Collector go?” note linking to the new repo and minimal migration steps. This will help users who have old configs/scripts.

.github/workflows/e2e_tests.yaml (1)

4-4: Security: pull_request_target runs with secrets

This workflow uses pull_request_target and provides OPENAI_API_KEY as a job-level secret. Even with persist-credentials off, secrets are accessible to steps that operate on PR-provided code/content (e.g., docker compose). Consider moving to pull_request with a hardened approach (or workflow_run from a trusted branch) and using a scoped token or ephemeral secret provisioning.

docs/getting_started.md (1)

267-270: Auth block addition LGTM; add note on precedence with service.auth_enabled

Good addition. Please add a one-liner clarifying how service.auth_enabled interacts with the new top-level authentication (which one takes precedence), to avoid confusion.

lightspeed-stack.yaml (1)

23-26: Top-level authentication block added; clarify/deprecate overlapping auth flags

This is good. Given service.auth_enabled still exists, please document or enforce precedence (e.g., deprecate service.auth_enabled in favor of authentication.module) to avoid dual-source-of-truth confusion.

docs/deployment_guide.md (3)

572-575: Auth config snippet updated — add brief usage note

Consider adding one sentence under this snippet explaining what "noop" does and how to switch providers.

885-887: Auth config snippet updated — keep examples consistent

Looks good. Repeat a short note about auth provider selection here for consistency.

1652-1654: Auth block addition LGTM (container flow)

Consistent with other examples. A single centralized “Authentication” subsection referenced by these snippets could reduce duplication.
src/lightspeed_stack.py (1)
56-59: Wire the --verbose flag to logging level

Verbose currently doesn’t affect log level. Set root and handler levels to DEBUG when requested.
     parser = create_argument_parser()
     args = parser.parse_args()

+    if args.verbose:
+        logging.getLogger().setLevel(logging.DEBUG)
+        for h in logging.getLogger().handlers:
+            h.setLevel(logging.DEBUG)
+
     configuration.load_configuration(args.config_file)
src/models/config.py (3)
49-55: Tighten PostgreSQL port validation messages

The logic is correct, but messages can be clearer and aligned with the checks.
-        if self.port <= 0:
-            raise ValueError("Port value should not be negative")
-        if self.port > 65535:
-            raise ValueError("Port value should be less than 65536")
+        if self.port <= 0:
+            raise ValueError("Port must be greater than 0")
+        if self.port > 65535:
+            raise ValueError("Port must be <= 65535")
73-75: Make default SQLite path portable across platforms

Hardcoding /tmp may not be portable (e.g., Windows) and could be restricted in some environments. Use tempfile.gettempdir().
+import tempfile
@@
-            sqlite_file_name = "/tmp/lightspeed-stack.db"
-            self.sqlite = SQLiteDatabaseConfiguration(db_path=sqlite_file_name)
+            tmp_dir = Path(tempfile.gettempdir())
+            sqlite_file_name = tmp_dir / "lightspeed-stack.db"
+            self.sqlite = SQLiteDatabaseConfiguration(db_path=str(sqlite_file_name))
152-160: Nit: unify “Llama” casing in error messages

Maintain consistent casing (“Llama”) across messages.
-                    "LLama stack URL is not specified and library client mode is not specified"
+                    "Llama stack URL is not specified and library client mode is not specified"
@@
-                    "LLama stack URL is not specified and library client mode is not enabled"
+                    "Llama stack URL is not specified and library client mode is not enabled"
@@
-                    "LLama stack library client mode is enabled but a configuration file path is not specified"  # noqa: E501
+                    "Llama stack library client mode is enabled but a configuration file path is not specified"  # noqa: E501
Also applies to: 166-168

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1d48b4 and f961d3f.

📒 Files selected for processing (19)

.github/workflows/e2e_tests.yaml (1 hunks)
Makefile (1 hunks)
README.md (1 hunks)
docs/config.puml (2 hunks)
docs/deployment_guide.md (4 hunks)
docs/getting_started.md (1 hunks)
docs/openapi.json (13 hunks)
docs/openapi.md (0 hunks)
docs/output.md (0 hunks)
docs/testing.md (1 hunks)
lightspeed-stack.yaml (1 hunks)
src/constants.py (0 hunks)
src/lightspeed_stack.py (1 hunks)
src/models/config.py (1 hunks)
src/runners/data_collector.py (0 hunks)
src/services/data_collector.py (0 hunks)
tests/unit/models/test_config.py (0 hunks)
tests/unit/runners/test_data_collector_runner.py (0 hunks)
tests/unit/services/test_data_collector.py (0 hunks)

💤 Files with no reviewable changes (8)

src/runners/data_collector.py
tests/unit/runners/test_data_collector_runner.py
src/constants.py
docs/openapi.md
src/services/data_collector.py
docs/output.md
tests/unit/models/test_config.py
tests/unit/services/test_data_collector.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests

🔇 Additional comments (8)

docs/testing.md (1)

108-110: Unit test tree update for runners looks correct

Removal of data collector runner tests is reflected here. Matches the PR objective.

Makefile (1)

11-11: Removed run-data-collector target

Target removal aligns with the feature deprecation. README Make targets are consistent.

docs/deployment_guide.md (1)

1497-1499: Auth block addition LGTM

Matches the new configuration surface.

src/lightspeed_stack.py (1)

50-50: All data-collector references successfully removed

Verified across the repository with:

rg -n --hidden --glob '!*venv*' 'data_collector|start_data_collector|--data-collector' (no matches found)

No further action required.

src/models/config.py (2)

42-44: PostgreSQL default constants verified
The constants POSTGRES_DEFAULT_SSL_MODE (line 53) and POSTGRES_DEFAULT_GSS_ENCMODE (line 55) are present in src/constants.py. No further action required.

1-305: No Data Collector references detected
A repository-wide search for common Data Collector identifiers (data_collector, DataCollector, ingress_server_url, ingress_server_auth_token, ingress_content_service_name, collection_interval, cleanup_after_send, connection_timeout, DATA_COLLECTOR) returned zero matches. All references have been removed.

docs/config.puml (1)

29-34: LGTM: InferenceConfiguration method surfaced

The addition of check_default_model_and_provider in the diagram matches the model. No issues.

docs/openapi.json (1)

825-832: LGTM: Configuration now includes database with default SQLite

This aligns with code defaults and DatabaseConfiguration schema.

coderabbitai · 2025-08-11T11:39:22Z

docs/openapi.json

+        "/v1/conversations": {
+            "get": {
+                "tags": [
+                    "conversations"
+                ],
+                "summary": "Get Conversations List Endpoint Handler",
+                "description": "Handle request to retrieve all conversations for the authenticated user.",
+                "operationId": "get_conversations_list_endpoint_handler_v1_conversations_get",
+                "responses": {
+                    "200": {
+                        "description": "Successful Response",
+                        "content": {
+                            "application/json": {
+                                "schema": {
+                                    "$ref": "#/components/schemas/ConversationsListResponse"
+                                }
+                            }
+                        },
+                        "conversations": [
+                            {
+                                "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
+                                "created_at": "2024-01-01T00:00:00Z",
+                                "last_message_at": "2024-01-01T00:05:00Z",
+                                "last_used_model": "gemini/gemini-1.5-flash",
+                                "last_used_provider": "gemini",
+                                "message_count": 5
+                            },
+                            {
+                                "conversation_id": "456e7890-e12b-34d5-a678-901234567890",
+                                "created_at": "2024-01-01T01:00:00Z",
+                                "last_message_at": "2024-01-01T01:02:00Z",
+                                "last_used_model": "gemini/gemini-2.0-flash",
+                                "last_used_provider": "gemini",
+                                "message_count": 2
+                            }
+                        ]
+                    },
+                    "503": {
+                        "description": "Service Unavailable",
+                        "detail": {
+                            "response": "Unable to connect to Llama Stack",
+                            "cause": "Connection error."
+                        }
+                    }
+                }
+            }
+        },


🛠️ Refactor suggestion

Add 400/403 responses to /v1/conversations (endpoint states “authenticated user”)

The description says it returns conversations for the authenticated user, but only 200/503 are documented. Include 400 (missing/invalid credentials) and 403 (unauthorized) like other endpoints.

"/v1/conversations": { "get": { "tags": ["conversations"], "summary": "Get Conversations List Endpoint Handler", "description": "Handle request to retrieve all conversations for the authenticated user.", "operationId": "get_conversations_list_endpoint_handler_v1_conversations_get", "responses": { "200": { ... }, + "400": { + "description": "Missing or invalid credentials provided by client", + "content": { + "application/json": { + "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } + } + } + }, + "403": { + "description": "User is not authorized", + "content": { + "application/json": { + "schema": { "$ref": "#/components/schemas/ForbiddenResponse" } + } + } + }, "503": { ... } } } },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"/v1/conversations": {

"get": {

"tags": [

"conversations"

],

"summary": "Get Conversations List Endpoint Handler",

"description": "Handle request to retrieve all conversations for the authenticated user.",

"operationId": "get_conversations_list_endpoint_handler_v1_conversations_get",

"responses": {

"200": {

"description": "Successful Response",

"content": {

"application/json": {

"schema": {

"$ref": "#/components/schemas/ConversationsListResponse"

}

}

},

"conversations": [

{

"conversation_id": "123e4567-e89b-12d3-a456-426614174000",

"created_at": "2024-01-01T00:00:00Z",

"last_message_at": "2024-01-01T00:05:00Z",

"last_used_model": "gemini/gemini-1.5-flash",

"last_used_provider": "gemini",

"message_count": 5

},

{

"conversation_id": "456e7890-e12b-34d5-a678-901234567890",

"created_at": "2024-01-01T01:00:00Z",

"last_message_at": "2024-01-01T01:02:00Z",

"last_used_model": "gemini/gemini-2.0-flash",

"last_used_provider": "gemini",

"message_count": 2

}

]

},

"503": {

"description": "Service Unavailable",

"detail": {

"response": "Unable to connect to Llama Stack",

"cause": "Connection error."

}

}

}

}

},

"/v1/conversations": {

"get": {

"tags": [

"conversations"

],

"summary": "Get Conversations List Endpoint Handler",

"description": "Handle request to retrieve all conversations for the authenticated user.",

"operationId": "get_conversations_list_endpoint_handler_v1_conversations_get",

"responses": {

"200": {

"description": "Successful Response",

"content": {

"application/json": {

"schema": {

"$ref": "#/components/schemas/ConversationsListResponse"

}

}

},

"conversations": [

{

"conversation_id": "123e4567-e89b-12d3-a456-426614174000",

"created_at": "2024-01-01T00:00:00Z",

"last_message_at": "2024-01-01T00:05:00Z",

"last_used_model": "gemini/gemini-1.5-flash",

"last_used_provider": "gemini",

"message_count": 5

},

{

"conversation_id": "456e7890-e12b-34d5-a678-901234567890",

"created_at": "2024-01-01T01:00:00Z",

"last_message_at": "2024-01-01T01:02:00Z",

"last_used_model": "gemini/gemini-2.0-flash",

"last_used_provider": "gemini",

"message_count": 2

}

]

},

"400": {

"description": "Missing or invalid credentials provided by client",

"content": {

"application/json": {

"schema": {

"$ref": "#/components/schemas/UnauthorizedResponse"

}

}

}

},

"403": {

"description": "User is not authorized",

"content": {

"application/json": {

"schema": {

"$ref": "#/components/schemas/ForbiddenResponse"

}

}

}

},

"503": {

"description": "Service Unavailable",

"detail": {

"response": "Unable to connect to Llama Stack",

"cause": "Connection error."

}

}

}

}

},

🤖 Prompt for AI Agents

In docs/openapi.json around lines 370 to 416, the /v1/conversations GET currently documents only 200 and 503 responses but the endpoint is for an authenticated user; add standard 400 and 403 responses consistent with other endpoints by adding a 400 response for missing/invalid credentials (brief description and reference to the existing error schema or inline error object) and a 403 response for unauthorized access (description and same error schema reference), ensuring the response objects match the OpenAPI structure used elsewhere (content/application/json with the proper error schema $ref or fields).

coderabbitai · 2025-08-11T11:39:23Z

docs/openapi.json

+            "ConversationDetails": {
+                "properties": {
+                    "conversation_id": {
+                        "type": "string",
+                        "title": "Conversation Id"
+                    },
+                    "created_at": {
+                        "anyOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "null"
+                            }
+                        ],
+                        "title": "Created At"
+                    },
+                    "last_message_at": {
+                        "anyOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "null"
+                            }
+                        ],
+                        "title": "Last Message At"
+                    },
+                    "message_count": {
+                        "anyOf": [
+                            {
+                                "type": "integer"
+                            },
+                            {
+                                "type": "null"
+                            }
+                        ],
+                        "title": "Message Count"
+                    },
+                    "last_used_model": {
+                        "anyOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "null"
+                            }
+                        ],
+                        "title": "Last Used Model"
+                    },
+                    "last_used_provider": {
+                        "anyOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "null"
+                            }
+                        ],
+                        "title": "Last Used Provider"
+                    }
+                },
+                "type": "object",
+                "required": [
+                    "conversation_id"
+                ],
+                "title": "ConversationDetails",
+                "description": "Model representing the details of a user conversation.\n\nAttributes:\n    conversation_id: The conversation ID (UUID).\n    created_at: When the conversation was created.\n    last_message_at: When the last message was sent.\n    message_count: Number of user messages in the conversation.\n    model: The model used for the conversation.\n\nExample:\n    ```python\n    conversation = ConversationSummary(\n        conversation_id=\"123e4567-e89b-12d3-a456-426614174000\"\n        created_at=\"2024-01-01T00:00:00Z\",\n        last_message_at=\"2024-01-01T00:05:00Z\",\n        message_count=5,\n        model=\"gemini/gemini-2.0-flash\"\n    )\n    ```"
+            },


⚠️ Potential issue

Fix ConversationDetails description: use last_used_model/last_used_provider; update examples

The schema fields are last_used_model and last_used_provider, but the description refers to model and examples omit last_used_provider.

"ConversationDetails": { "properties": { ... }, "type": "object", "required": ["conversation_id"], "title": "ConversationDetails", - "description": "Model representing the details of a user conversation.\n\nAttributes:\n conversation_id: The conversation ID (UUID).\n created_at: When the conversation was created.\n last_message_at: When the last message was sent.\n message_count: Number of user messages in the conversation.\n model: The model used for the conversation.\n\nExample:\n ```python\n conversation = ConversationSummary(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\"\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n model=\"gemini/gemini-2.0-flash\"\n )\n ```" + "description": "Model representing the details of a user conversation.\n\nAttributes:\n conversation_id: The conversation ID (UUID).\n created_at: When the conversation was created.\n last_message_at: When the last message was sent.\n message_count: Number of user messages in the conversation.\n last_used_model: The last model used in the conversation.\n last_used_provider: The provider of the last model.\n\nExample:\n ```python\n conversation = ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n last_used_model=\"gemini/gemini-2.0-flash\",\n last_used_provider=\"gemini\"\n )\n ```" },

🤖 Prompt for AI Agents

In docs/openapi.json around lines 911 to 979, the ConversationDetails description and example refer to "model" and omit "last_used_provider" while the schema uses last_used_model and last_used_provider; update the description to mention last_used_model and last_used_provider and change the example to construct a ConversationDetails object (not ConversationSummary) including both last_used_model and last_used_provider with appropriate sample values, and ensure the attribute names in the docstring match the schema exactly.

coderabbitai · 2025-08-11T11:39:23Z

docs/openapi.json

+            "ConversationsListResponse": {
+                "properties": {
+                    "conversations": {
+                        "items": {
+                            "$ref": "#/components/schemas/ConversationDetails"
+                        },
+                        "type": "array",
+                        "title": "Conversations"
+                    }
+                },
+                "type": "object",
+                "required": [
+                    "conversations"
+                ],
+                "title": "ConversationsListResponse",
+                "description": "Model representing a response for listing conversations of a user.\n\nAttributes:\n    conversations: List of conversation details associated with the user.\n\nExample:\n    ```python\n    conversations_list = ConversationsListResponse(\n        conversations=[\n            ConversationDetails(\n                conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n                created_at=\"2024-01-01T00:00:00Z\",\n                last_message_at=\"2024-01-01T00:05:00Z\",\n                message_count=5,\n                model=\"gemini/gemini-2.0-flash\"\n            ),\n            ConversationDetails(\n                conversation_id=\"456e7890-e12b-34d5-a678-901234567890\"\n                created_at=\"2024-01-01T01:00:00Z\",\n                message_count=2,\n                model=\"gemini/gemini-2.5-flash\"\n            )\n        ]\n    )\n    ```",
+                "examples": [
+                    {
+                        "conversations": [
+                            {
+                                "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
+                                "created_at": "2024-01-01T00:00:00Z",
+                                "last_message_at": "2024-01-01T00:05:00Z",
+                                "message_count": 5,
+                                "model": "gemini/gemini-2.0-flash"
+                            },
+                            {
+                                "conversation_id": "456e7890-e12b-34d5-a678-901234567890",
+                                "created_at": "2024-01-01T01:00:00Z",
+                                "message_count": 2,
+                                "model": "gemini/gemini-2.5-flash"
+                            }
+                        ]
+                    }
+                ]
+            },


⚠️ Potential issue

Fix ConversationsListResponse examples: replace model with last_used_model and include last_used_provider

Examples use a non-existent model field.

"ConversationsListResponse": { "properties": { ... }, "type": "object", "required": ["conversations"], "title": "ConversationsListResponse", - "description": "Model representing a response for listing conversations of a user.\n\nAttributes:\n conversations: List of conversation details associated with the user.\n\nExample:\n ```python\n conversations_list = ConversationsListResponse(\n conversations=[\n ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n model=\"gemini/gemini-2.0-flash\"\n ),\n ConversationDetails(\n conversation_id=\"456e7890-e12b-34d5-a678-901234567890\"\n created_at=\"2024-01-01T01:00:00Z\",\n message_count=2,\n model=\"gemini/gemini-2.5-flash\"\n )\n ]\n )\n ```", + "description": "Model representing a response for listing conversations of a user.\n\nAttributes:\n conversations: List of conversation details associated with the user.\n\nExample:\n ```python\n conversations_list = ConversationsListResponse(\n conversations=[\n ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n last_used_model=\"gemini/gemini-2.0-flash\",\n last_used_provider=\"gemini\"\n ),\n ConversationDetails(\n conversation_id=\"456e7890-e12b-34d5-a678-901234567890\",\n created_at=\"2024-01-01T01:00:00Z\",\n message_count=2,\n last_used_model=\"gemini/gemini-2.5-flash\",\n last_used_provider=\"gemini\"\n )\n ]\n )\n ```", "examples": [ { "conversations": [ { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", "created_at": "2024-01-01T00:00:00Z", "last_message_at": "2024-01-01T00:05:00Z", - "message_count": 5, - "model": "gemini/gemini-2.0-flash" + "message_count": 5, + "last_used_model": "gemini/gemini-2.0-flash", + "last_used_provider": "gemini" }, { "conversation_id": "456e7890-e12b-34d5-a678-901234567890", "created_at": "2024-01-01T01:00:00Z", - "message_count": 2, - "model": "gemini/gemini-2.5-flash" + "message_count": 2, + "last_used_model": "gemini/gemini-2.5-flash", + "last_used_provider": "gemini" } ] } ] }

🤖 Prompt for AI Agents

In docs/openapi.json around lines 1024 to 1059, the ConversationsListResponse examples and example Python snippet reference a non-existent "model" field; update both the description/example block and the JSON examples to replace "model" with "last_used_model" and add "last_used_provider" (e.g., last_used_model: "gemini/gemini-2.0-flash", last_used_provider: "gemini") for each ConversationDetails entry so the examples match the actual schema.

coderabbitai · 2025-08-11T11:39:23Z

src/models/config.py

 from typing import Optional

-from pydantic import BaseModel, model_validator, FilePath, AnyHttpUrl, PositiveInt
+from pydantic import BaseModel, model_validator, FilePath, AnyHttpUrl


🛠️ Refactor suggestion

Use Field(default_factory=...) to avoid shared default instances and mutable defaults

Several fields instantiate BaseModel instances or lists as defaults. In pydantic, prefer Field(default_factory=...) to prevent shared state across instances.

Apply:

-from pydantic import BaseModel, model_validator, FilePath, AnyHttpUrl +from pydantic import BaseModel, model_validator, FilePath, AnyHttpUrl, Field @@ - tls_config: TLSConfiguration = TLSConfiguration() + tls_config: TLSConfiguration = Field(default_factory=TLSConfiguration) @@ -class JwkConfiguration(BaseModel): +class JwkConfiguration(BaseModel): @@ - jwt_configuration: JwtConfiguration = JwtConfiguration() + jwt_configuration: JwtConfiguration = Field(default_factory=JwtConfiguration) @@ class Configuration(BaseModel): @@ - database: DatabaseConfiguration = DatabaseConfiguration() - mcp_servers: list[ModelContextProtocolServer] = [] - authentication: Optional[AuthenticationConfiguration] = ( - AuthenticationConfiguration() - ) + database: DatabaseConfiguration = Field(default_factory=DatabaseConfiguration) + mcp_servers: list[ModelContextProtocolServer] = Field(default_factory=list) + authentication: Optional[AuthenticationConfiguration] = Field( + default_factory=AuthenticationConfiguration + ) customization: Optional[Customization] = None - inference: InferenceConfiguration = InferenceConfiguration() + inference: InferenceConfiguration = Field(default_factory=InferenceConfiguration)

Also applies to: 108-109, 199-208, 289-300

🤖 Prompt for AI Agents

In src/models/config.py around lines 6 (and also apply same change at 108-109, 199-208, 289-300), several Pydantic model fields are using mutable default values (BaseModel instances or lists) directly; replace those direct defaults with Field(default_factory=...) so each model instance gets its own fresh object; update the import to include Field if missing and change each offending field declaration to use Field(default_factory=lambda: <default>) or Field(default_factory=SomeModel) as appropriate.

onmete · 2025-08-11T11:54:04Z

There is a bigger change in docs/openapi.json as a result of make schema. I suspect it is not updated automatically.

tisnik

LGTM

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

Remove data collector service

a9c1097

onmete force-pushed the remove-data-collector branch from f961d3f to a9c1097 Compare August 12, 2025 07:51

tisnik approved these changes Aug 12, 2025

View reviewed changes

tisnik merged commit c09653b into lightspeed-core:main Aug 12, 2025
17 checks passed

coderabbitai bot mentioned this pull request Aug 18, 2025

implements referenced documents on /query and updates /streaming_query to match #403

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-471: Remove data collector service #377

LCORE-471: Remove data collector service #377

Uh oh!

onmete commented Aug 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 11, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 11, 2025

Uh oh!

coderabbitai bot Aug 11, 2025

Uh oh!

coderabbitai bot Aug 11, 2025

Uh oh!

coderabbitai bot Aug 11, 2025

Uh oh!

onmete commented Aug 11, 2025

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LCORE-471: Remove data collector service #377

LCORE-471: Remove data collector service #377

Uh oh!

Conversation

onmete commented Aug 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

onmete commented Aug 11, 2025

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

onmete commented Aug 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 11, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)