Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 1 addition & 8 deletions .github/workflows/e2e_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -87,14 +87,7 @@ jobs:
feedback_storage: "/tmp/data/feedback"
transcripts_disabled: false
transcripts_storage: "/tmp/data/transcripts"
data_collector:
enabled: false
ingress_server_url: null
ingress_server_auth_token: null
ingress_content_service_name: null
collection_interval: 7200 # 2 hours in seconds
cleanup_after_send: true
connection_timeout_seconds: 30

authentication:
module: "noop"

Expand Down
3 changes: 1 addition & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@ PYTHON_REGISTRY = pypi
run: ## Run the service locally
uv run src/lightspeed_stack.py

run-data-collector: ## Run the data collector service locally
uv run src/lightspeed_stack.py --data-collector


test-unit: ## Run the unit tests
@echo "Running unit tests..."
Expand Down
50 changes: 1 addition & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,7 @@ Lightspeed Core Stack (LCS) is an AI-powered assistant that provides answers to
* [Utility to generate OpenAPI schema](#utility-to-generate-openapi-schema)
* [Path](#path)
* [Usage](#usage-1)
* [Data Collector Service](#data-collector-service)
* [Features](#features)
* [Configuration](#configuration-1)
* [Running the Service](#running-the-service)


<!-- vim-markdown-toc -->

Expand Down Expand Up @@ -253,7 +250,6 @@ Usage: make <OPTIONS> ... <TARGETS>
Available targets are:

run Run the service locally
run-data-collector Run the data collector service
test-unit Run the unit tests
test-integration Run integration tests tests
test-e2e Run BDD tests for the service
Expand Down Expand Up @@ -421,50 +417,6 @@ This script re-generated OpenAPI schema for the Lightspeed Service REST API.
make schema
```

## Data Collector Service

The data collector service is a standalone service that runs separately from the main web service. It is responsible for collecting and sending user data including feedback and transcripts to an ingress server for analysis and archival.

### Features

- **Periodic Collection**: Runs at configurable intervals
- **Data Packaging**: Packages feedback and transcript files into compressed tar.gz archives
- **Secure Transmission**: Sends data to a configured ingress server with optional authentication
- **File Cleanup**: Optionally removes local files after successful transmission
- **Error Handling**: Includes retry logic and comprehensive error handling

### Configuration

The data collector service is configured through the `user_data_collection.data_collector` section in your configuration file:

```yaml
user_data_collection:
feedback_enabled: true
feedback_storage: "/tmp/data/feedback"
transcripts_enabled: true
transcripts_storage: "/tmp/data/transcripts"
data_collector:
enabled: true
ingress_server_url: "https://your-ingress-server.com"
ingress_server_auth_token: "your-auth-token"
ingress_content_service_name: "lightspeed-team"
collection_interval: 7200 # 2 hours in seconds
cleanup_after_send: true
connection_timeout: 30
```

### Running the Service

To run the data collector service:

```bash
# Using Python directly
uv run src/lightspeed_stack.py --data-collector

# Using Make target
make run-data-collector
```



# Project structure
Expand Down
14 changes: 2 additions & 12 deletions docs/config.puml
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,7 @@ class "Customization" as src.models.config.Customization {
system_prompt_path : Optional[FilePath]
check_customization_model() -> Self
}
class "DataCollectorConfiguration" as src.models.config.DataCollectorConfiguration {
cleanup_after_send : bool
collection_interval : Annotated
connection_timeout : Annotated
enabled : bool
ingress_content_service_name : Optional[str]
ingress_server_auth_token : Optional[str]
ingress_server_url : Optional[str]
check_data_collector_configuration() -> Self
}

class "InferenceConfiguration" as src.models.config.InferenceConfiguration {
default_model : Optional[str]
default_provider : Optional[str]
Expand Down Expand Up @@ -78,14 +69,13 @@ class "TLSConfiguration" as src.models.config.TLSConfiguration {
check_tls_configuration() -> Self
}
class "UserDataCollection" as src.models.config.UserDataCollection {
data_collector
feedback_enabled : bool
feedback_storage : Optional[str]
transcripts_enabled : bool
transcripts_storage : Optional[str]
check_storage_location_is_set_when_needed() -> Self
}
src.models.config.DataCollectorConfiguration --* src.models.config.UserDataCollection : data_collector

src.models.config.InferenceConfiguration --* src.models.config.Configuration : inference
src.models.config.JwtConfiguration --* src.models.config.JwkConfiguration : jwt_configuration
src.models.config.LlamaStackConfiguration --* src.models.config.Configuration : llama_stack
Expand Down
18 changes: 2 additions & 16 deletions docs/deployment_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1099,14 +1099,7 @@ user_data_collection:
feedback_storage: "/tmp/data/feedback"
transcripts_enabled: true
transcripts_storage: "/tmp/data/transcripts"
data_collector:
enabled: false
ingress_server_url: null
ingress_server_auth_token: null
ingress_content_service_name: null
collection_interval: 7200 # 2 hours in seconds
cleanup_after_send: true
connection_timeout_seconds: 30

authentication:
module: "noop"
```
Expand Down Expand Up @@ -1261,14 +1254,7 @@ user_data_collection:
feedback_storage: "/tmp/data/feedback"
transcripts_enabled: true
transcripts_storage: "/tmp/data/transcripts"
data_collector:
enabled: false
ingress_server_url: null
ingress_server_auth_token: null
ingress_content_service_name: null
collection_interval: 7200 # 2 hours in seconds
cleanup_after_send: true
connection_timeout_seconds: 30

authentication:
module: "noop"
```
Expand Down
9 changes: 1 addition & 8 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,14 +264,7 @@ user_data_collection:
feedback_storage: "/tmp/data/feedback"
transcripts_enabled: true
transcripts_storage: "/tmp/data/transcripts"
data_collector:
enabled: false
ingress_server_url: null
ingress_server_auth_token: null
ingress_content_service_name: null
collection_interval: 7200 # 2 hours in seconds
cleanup_after_send: true
connection_timeout_seconds: 30

authentication:
module: "noop"
```
Expand Down
62 changes: 9 additions & 53 deletions docs/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -1101,67 +1101,32 @@
"title": "Customization",
"description": "Service customization."
},
"DataCollectorConfiguration": {
"DatabaseConfiguration": {
"properties": {
"enabled": {
"type": "boolean",
"title": "Enabled",
"default": false
},
"ingress_server_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Ingress Server Url"
},
"ingress_server_auth_token": {
"sqlite": {
"anyOf": [
{
"type": "string"
"$ref": "#/components/schemas/SQLiteDatabaseConfiguration"
},
{
"type": "null"
}
],
"title": "Ingress Server Auth Token"
]
},
"ingress_content_service_name": {
"postgres": {
"anyOf": [
{
"type": "string"
"$ref": "#/components/schemas/PostgreSQLDatabaseConfiguration"
},
{
"type": "null"
}
],
"title": "Ingress Content Service Name"
},
"collection_interval": {
"type": "integer",
"exclusiveMinimum": 0.0,
"title": "Collection Interval",
"default": 7200
},
"cleanup_after_send": {
"type": "boolean",
"title": "Cleanup After Send",
"default": true
},
"connection_timeout": {
"type": "integer",
"exclusiveMinimum": 0.0,
"title": "Connection Timeout",
"default": 30
]
}
},
"type": "object",
"title": "DataCollectorConfiguration",
"description": "Data collector configuration for sending data to ingress server."
"title": "DatabaseConfiguration",
"description": "Database configuration."
},
"DatabaseConfiguration": {
"properties": {
Expand Down Expand Up @@ -2122,15 +2087,6 @@
}
],
"title": "Transcripts Storage"
},
"data_collector": {
"$ref": "#/components/schemas/DataCollectorConfiguration",
"default": {
"enabled": false,
"collection_interval": 7200,
"cleanup_after_send": true,
"connection_timeout": 30
}
}
},
"type": "object",
Expand Down
16 changes: 0 additions & 16 deletions docs/openapi.md
Original file line number Diff line number Diff line change
Expand Up @@ -577,21 +577,6 @@ Service customization.
| system_prompt | | |


## DataCollectorConfiguration


Data collector configuration for sending data to ingress server.


| Field | Type | Description |
|-------|------|-------------|
| enabled | boolean | |
| ingress_server_url | | |
| ingress_server_auth_token | | |
| ingress_content_service_name | | |
| collection_interval | integer | |
| cleanup_after_send | boolean | |
| connection_timeout | integer | |


## DatabaseConfiguration
Expand Down Expand Up @@ -1026,7 +1011,6 @@ User data collection configuration.
| feedback_storage | | |
| transcripts_enabled | boolean | |
| transcripts_storage | | |
| data_collector | | |


## ValidationError
Expand Down
17 changes: 0 additions & 17 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -577,22 +577,6 @@ Service customization.
| system_prompt | | |


## DataCollectorConfiguration


Data collector configuration for sending data to ingress server.


| Field | Type | Description |
|-------|------|-------------|
| enabled | boolean | |
| ingress_server_url | | |
| ingress_server_auth_token | | |
| ingress_content_service_name | | |
| collection_interval | integer | |
| cleanup_after_send | boolean | |
| connection_timeout | integer | |


## DatabaseConfiguration

Expand Down Expand Up @@ -1016,7 +1000,6 @@ User data collection configuration.
| feedback_storage | | |
| transcripts_enabled | boolean | |
| transcripts_storage | | |
| data_collector | | |


## ValidationError
Expand Down
6 changes: 2 additions & 4 deletions docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,9 @@ As specified in Definition of Done, new changes need to be covered by tests.
│   ├── test_requests.py
│   └── test_responses.py
├── runners
│   ├── __init__.py
│   ├── test_data_collector_runner.py
│   └── test_uvicorn_runner.py
│ ├── __init__.py
│ └── test_uvicorn_runner.py
├── services
│   └── test_data_collector.py
├── test_client.py
├── test_configuration.py
├── test_lightspeed_stack.py
Expand Down
9 changes: 1 addition & 8 deletions lightspeed-stack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,6 @@ user_data_collection:
feedback_storage: "/tmp/data/feedback"
transcripts_enabled: true
transcripts_storage: "/tmp/data/transcripts"
data_collector:
enabled: false
ingress_server_url: null
ingress_server_auth_token: null
ingress_content_service_name: null
collection_interval: 7200 # 2 hours in seconds
cleanup_after_send: true
connection_timeout_seconds: 30

authentication:
module: "noop"
4 changes: 0 additions & 4 deletions src/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,6 @@
DEFAULT_JWT_UID_CLAIM = "user_id"
DEFAULT_JWT_USER_NAME_CLAIM = "username"

# Data collector constants
DATA_COLLECTOR_COLLECTION_INTERVAL = 7200 # 2 hours in seconds
DATA_COLLECTOR_CONNECTION_TIMEOUT = 30
DATA_COLLECTOR_RETRY_INTERVAL = 300 # 5 minutes in seconds

# PostgreSQL connection constants
# See: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-SSLMODE
Expand Down
Loading