Skip to content

Conversation

@radofuchs
Copy link
Contributor

@radofuchs radofuchs commented Aug 25, 2025

Description

Added E2E tests for health endpoints

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue # LCORE-490
  • Closes # LCORE-490

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Tests
    • Expanded end-to-end REST/health coverage with scenarios simulating a disrupted backend service.
    • Readiness now validated to return 503 with a clear reason when providers are unhealthy; liveness remains 200.
    • Reorganized and clarified scenario names, descriptions, and formatting.
    • Implemented automated disruption and robust environment restoration after disruptive scenarios (service stop/start attempts, health polling, timed retries, and error-safe handling) to reduce flakiness.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 25, 2025

Walkthrough

Restores a disrupted llama-stack container after scenarios: the disruption step stops the container and records prior state; after_scenario restarts the container when needed and polls its /v1/health endpoint until healthy; health feature expectations adjusted for readiness (503) and liveness (200) under disruption.

Changes

Cohort / File(s) Summary
E2E feature file
tests/e2e/features/health.feature
Renames and rewords scenarios; adjusts readiness expectation to 503 with a minimal body when llama-stack is disrupted; asserts liveness remains 200 with {"alive": true} under disruption; minor formatting edits.
Health step implementations
tests/e2e/features/steps/health.py
Implements llama_stack_connection_broken: inspects container running state, stores it on context.llama_stack_was_running, stops llama-stack when running, waits ~2s, logs outcomes, and handles subprocess.CalledProcessError; adds subprocess and time imports.
Environment hooks for restore
tests/e2e/features/environment.py
Extends after_scenario to, when context.llama_stack_was_running is true, run docker start llama-stack, wait ~5s, poll /v1/health via docker exec ... curl -f up to 6 attempts with 5s intervals, log outcomes, and catch subprocess.CalledProcessError; adds subprocess and time imports.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Tester as Test Runner
  participant Step as Step: llama_stack_connection_broken
  participant Docker as Docker Engine
  Note over Tester,Step: Disrupt llama-stack before readiness/liveness checks
  Tester->>Step: Execute disruption step
  Step->>Docker: docker inspect --format='{{.State.Running}}' llama-stack
  alt Container running
    Step->>Docker: docker stop llama-stack
    Note right of Step: wait ~2s to ensure disruption
    Step-->>Tester: context.llama_stack_was_running = True
  else Not running
    Step-->>Tester: context.llama_stack_was_running = False
  end
Loading
sequenceDiagram
  autonumber
  participant Hooks as after_scenario Hook
  participant Docker as Docker Engine
  Note over Hooks,Docker: Restore llama-stack only if it was running before scenario
  Hooks->>Hooks: if context.llama_stack_was_running
  alt Was running
    Hooks->>Docker: docker start llama-stack
    Note right of Hooks: wait ~5s
    loop Up to 6 attempts (5s interval)
      Hooks->>Docker: docker exec llama-stack curl -f http://localhost:8321/v1/health
      alt Health OK
        Docker-->>Hooks: 200 OK -> stop polling
      else Timeout/Failure
        Docker-->>Hooks: failure/timeout -> log and sleep 5s
      end
    end
    Note over Hooks: log warning if still unhealthy after attempts
  else Was not running
    Note right of Hooks: no action
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • tisnik

Poem

I hopped into tests with a curious tap,
Paused a stack’s hum with a gentle clap,
I poked with curl and waited with cheer,
Restarted the beat till the health did appear,
A rabbit’s small nudge — containers back near. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@radofuchs radofuchs requested a review from tisnik August 25, 2025 08:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
tests/e2e/features/steps/health.py (3)

29-31: Replace fixed sleep with a short poll to confirm the container actually stopped.

A static 2s sleep is brittle and can still race under load. Polling for the container to be not-running is more reliable.

Apply this diff:

-            # Wait a moment for the connection to be fully disrupted
-            time.sleep(2)
+            # Wait until the container is confirmed not running (max ~5s)
+            for _ in range(10):
+                chk = subprocess.run(
+                    [
+                        "docker", "ps",
+                        "--filter", "name=^/llama-stack$",
+                        "--filter", "status=running",
+                        "--format", "{{.ID}}",
+                    ],
+                    capture_output=True,
+                    text=True,
+                    check=False,
+                )
+                if not chk.stdout.strip():
+                    break
+                time.sleep(0.5)

36-37: Broaden exception handling to cover missing docker binary and avoid noisy stacktraces.

If docker is unavailable, FileNotFoundError is raised. Catch it explicitly, and avoid leaving llama_stack_was_running True on failure.

Apply this diff:

-    except subprocess.CalledProcessError as e:
-        print(f"Warning: Could not disrupt Llama Stack connection: {e}")
+    except (subprocess.CalledProcessError, FileNotFoundError) as e:
+        context.llama_stack_was_running = False
+        print(f"Warning: Could not disrupt Llama Stack connection: {e}")

41-45: Unimplemented step: either implement or delete to prevent accidental use.

This placeholder can cause confusion later. If not needed, remove; otherwise implement a minimal stop using docker/compose.

Example minimal implementation (only if truly needed):

 def stop_service(context: Context) -> None:
-    """Stop service."""
-    # TODO: add step implementation
-    assert context is not None
+    """Stop REST API service container."""
+    try:
+        subprocess.run(["docker", "stop", "lightspeed-rest"], check=True, capture_output=True)
+    except (subprocess.CalledProcessError, FileNotFoundError) as e:
+        raise AssertionError(f"Failed to stop service: {e}")

If you use docker compose, adjust to docker compose stop <service>. I can wire this to your actual service name if you confirm it.

tests/e2e/features/environment.py (1)

68-76: Ensure the wait messaging and final warning always execute on failed attempts.

Given the fix above, keep the backoff semantics intact.

No code change required beyond the previous diff; comment retained to acknowledge the now-correct control flow.

tests/e2e/features/health.feature (2)

11-12: Polish scenario titles for grammar and clarity.

Minor grammar fixes to make titles read cleanly. No functional impact.

Apply this diff:

-  Scenario: Check if service report proper readiness state
+  Scenario: Check if service reports the proper readiness state
@@
-  Scenario: Check if service report proper liveness state
+  Scenario: Check if service reports the proper liveness state
@@
-  Scenario: Check if service report proper readiness state when llama stack is not available
+  Scenario: Check if service reports the proper readiness state when the llama-stack is not available
@@
-  Scenario: Check if service report proper liveness state even when llama stack is not available
+  Scenario: Check if service reports the proper liveness state even when the llama-stack is not available

Also applies to: 29-30, 45-46, 56-57


45-46: Tag disruptive scenarios so they can be included/excluded explicitly in CI runs.

Adding a tag lets you run these disruption tests conditionally (e.g., nightly) and makes their impact obvious in reports.

Apply this diff:

-  Scenario: Check if service reports the proper readiness state when the llama-stack is not available
+  @llama_disruption
+  Scenario: Check if service reports the proper readiness state when the llama-stack is not available
@@
-  Scenario: Check if service reports the proper liveness state even when the llama-stack is not available
+  @llama_disruption
+  Scenario: Check if service reports the proper liveness state even when the llama-stack is not available

Also applies to: 56-57

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b3351ca and 4ecf9dc.

📒 Files selected for processing (3)
  • tests/e2e/features/environment.py (2 hunks)
  • tests/e2e/features/health.feature (1 hunks)
  • tests/e2e/features/steps/health.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/e2e/features/steps/health.py (1)

15-27: Fix running-state check; current logic treats "false" as truthy and stops a non-running container

docker inspect -f '{{.State.Running}}' returns "true" or "false". Checking only for non-empty output will set llama_stack_was_running = True and attempt docker stop even when the container is already stopped. Also prefer check=False for the inspect call and derive is_running from both return code and stdout. Add a small stop timeout to speed CI.

Apply:

-        result = subprocess.run(
-            ["docker", "inspect", "-f", "{{.State.Running}}", "llama-stack"],
-            capture_output=True,
-            text=True,
-            check=True,
-        )
-
-        if result.stdout.strip():
+        result = subprocess.run(
+            ["docker", "inspect", "-f", "{{.State.Running}}", "llama-stack"],
+            capture_output=True,
+            text=True,
+            check=False,
+        )
+
+        is_running = (result.returncode == 0 and result.stdout.strip().lower() == "true")
+        if is_running:
             context.llama_stack_was_running = True
             subprocess.run(
-                ["docker", "stop", "llama-stack"], check=True, capture_output=True
+                ["docker", "stop", "--time=5", "llama-stack"], check=True, capture_output=True
             )
🧹 Nitpick comments (4)
tests/e2e/features/steps/health.py (4)

29-31: Remove unnecessary sleep after docker stop

docker stop is synchronous; it returns after the container has stopped. The extra 2s sleep just slows the suite.

-            # Wait a moment for the connection to be fully disrupted
-            time.sleep(2)
+            # `docker stop` is synchronous; no extra wait needed

36-37: Handle missing Docker and surface stderr for easier triage

Catching only CalledProcessError misses environments without Docker (raises FileNotFoundError). Also print captured stderr to help debugging failures.

-    except subprocess.CalledProcessError as e:
-        print(f"Warning: Could not disrupt Llama Stack connection: {e}")
+    except FileNotFoundError:
+        print("Warning: Docker not found in PATH; cannot disrupt Llama Stack connection.")
+    except subprocess.CalledProcessError as e:
+        err = getattr(e, "stderr", "") or ""
+        print(f"Warning: Could not disrupt Llama Stack connection: {e}. Stderr: {err}")

3-4: Parameterize container name via environment to ease local/CI variability

Compose setups often alter container names. Let the name be overridden with LLAMA_STACK_CONTAINER, defaulting to llama-stack.

+import os
 import subprocess
 import time
+CONTAINER_NAME = os.environ.get("LLAMA_STACK_CONTAINER", "llama-stack")
@@
-        result = subprocess.run(
-            ["docker", "inspect", "-f", "{{.State.Running}}", "llama-stack"],
+        result = subprocess.run(
+            ["docker", "inspect", "-f", "{{.State.Running}}", CONTAINER_NAME],
@@
-            subprocess.run(
-                ["docker", "stop", "--time=5", "llama-stack"], check=True, capture_output=True
+            subprocess.run(
+                ["docker", "stop", "--time=5", CONTAINER_NAME], check=True, capture_output=True
             )

Also applies to: 17-18, 25-27


40-45: No Active References for “the service is stopped”

The only occurrences of this step are commented out in tests/e2e/features/conversations.feature (lines 29 and 51), so there are no active scenarios depending on it. You can choose to:

  • Remove or update the commented lines if you don’t plan to use this step.
  • Implement the stop_service step now (e.g., invoke the shared stop routine or the llama-stack disruptor) so it’s ready once these scenarios are re-enabled.

Locations found:

  • tests/e2e/features/conversations.feature:29
  • tests/e2e/features/conversations.feature:51

Let me know if you’d like help wiring up the actual stop logic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d7955ce and c05ebdc.

📒 Files selected for processing (1)
  • tests/e2e/features/steps/health.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests
  • GitHub Check: build-pr

@radofuchs radofuchs force-pushed the LCORE_490_Health_endpoints_e2e branch from c05ebdc to f92755d Compare August 25, 2025 11:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tests/e2e/features/environment.py (2)

37-44: Reset the disruption flag in a finally block to avoid cross-scenario leakage; also handle missing Docker binary.

Behave’s Context persists across scenarios; without clearing context.llama_stack_was_running, unrelated scenarios may attempt a restart. Add a finally to always clear the flag and broaden exception handling to include FileNotFoundError (e.g., when docker isn’t available).

Apply this diff:

-    if hasattr(context, "llama_stack_was_running") and context.llama_stack_was_running:
-        try:
+    if hasattr(context, "llama_stack_was_running") and context.llama_stack_was_running:
+        try:
             # Start the llama-stack container again
             subprocess.run(
                 ["docker", "start", "llama-stack"], check=True, capture_output=True
             )
             ...
-        except subprocess.CalledProcessError as e:
-            print(f"Warning: Could not restore Llama Stack connection: {e}")
+        except (subprocess.CalledProcessError, FileNotFoundError) as e:
+            print(f"Warning: Could not restore Llama Stack connection: {e}")
+        finally:
+            # Prevent leakage of the flag into subsequent scenarios
+            if hasattr(context, "llama_stack_was_running"):
+                delattr(context, "llama_stack_was_running")

Also applies to: 81-82


50-66: Fix health polling: check=True aborts on first non-200; use check=False and keep retrying.

With check=True, curl returns non-zero for 5xx and raises CalledProcessError, exiting the loop before retries. The subsequent returncode check is moot. Switch to check=False, add curl’s -s/-S and a per-request cap, and keep catching TimeoutExpired.

Apply this diff:

-            for attempt in range(6):  # Try for 30 seconds
+            for attempt in range(6):  # Try for ~30s of polling (per-attempt timeout)
                 try:
                     result = subprocess.run(
                         [
-                            "docker",
-                            "exec",
-                            "llama-stack",
-                            "curl",
-                            "-f",
-                            "http://localhost:8321/v1/health",
+                            "docker", "exec", "llama-stack",
+                            "curl", "-sfS", "--max-time", "4",
+                            "http://localhost:8321/v1/health",
                         ],
-                        capture_output=True,
-                        timeout=5,
-                        check=True,
+                        capture_output=True,
+                        text=True,
+                        timeout=5,
+                        check=False,
                     )
                     if result.returncode == 0:
                         print("✓ Llama Stack connection restored successfully")
                         break
                 except subprocess.TimeoutExpired:
                     print(f"⏱Health check timed out on attempt {attempt + 1}/6")
 
                 if attempt < 5:
                     print(
                         f"Waiting for Llama Stack to be healthy... (attempt {attempt + 1}/6)"
                     )
                     time.sleep(5)

Also applies to: 69-75

🧹 Nitpick comments (1)
tests/e2e/features/environment.py (1)

45-48: Optional: Remove the unconditional 5s sleep and start polling immediately.

The initial fixed wait slows teardown. Polling already has timeouts and backoff; you can start attempts right away.

-            print("Restoring Llama Stack connection...")
-            time.sleep(5)
+            print("Restoring Llama Stack connection...")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c05ebdc and f92755d.

📒 Files selected for processing (3)
  • tests/e2e/features/environment.py (2 hunks)
  • tests/e2e/features/health.feature (1 hunks)
  • tests/e2e/features/steps/health.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/e2e/features/health.feature
  • tests/e2e/features/steps/health.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T09:11:38.678Z
Learnt from: radofuchs
PR: lightspeed-core/lightspeed-stack#445
File: tests/e2e/features/environment.py:0-0
Timestamp: 2025-08-25T09:11:38.678Z
Learning: In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared, so per-scenario state should be reset in hooks to maintain test isolation.

Applied to files:

  • tests/e2e/features/environment.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests
  • GitHub Check: build-pr
🔇 Additional comments (1)
tests/e2e/features/environment.py (1)

10-11: LGTM: Minimal, targeted imports.

The added imports are scoped to the new teardown logic and look appropriate.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 053d49f into lightspeed-core:main Aug 25, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants