Skip to content

Conversation

@atchernych
Copy link
Contributor

@atchernych atchernych commented Jul 24, 2025

Overview:

DYN-704

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Introduced a complete "hello world" example demonstrating a simple distributed service using Dynamo's runtime, including backend and client scripts.
    • Added step-by-step documentation for setup, execution, and deployment, including Kubernetes deployment instructions with resource and health probe configurations.
    • Provided sample code for streaming responses and client integration.
  • Documentation

    • Added a comprehensive README detailing architecture, prerequisites, usage instructions, and deployment steps for the example.

@atchernych atchernych requested review from a team, nnshah1 and whoisj as code owners July 24, 2025 17:26
@github-actions github-actions bot added the feat label Jul 24, 2025
@atchernych atchernych changed the title feat: hello world deploy example draft: hello world deploy example Jul 24, 2025
@atchernych atchernych marked this pull request as draft July 24, 2025 17:31
@atchernych atchernych changed the title draft: hello world deploy example feat: hello world deploy example Jul 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 24, 2025

Walkthrough

This change introduces a new "hello world" example for Dynamo's distributed runtime, including a backend service, a client script, Kubernetes deployment configuration, and comprehensive documentation. The additions demonstrate a streaming endpoint, client-server interaction, deployment instructions, and health/resource management for both local and Kubernetes environments.

Changes

File(s) Change Summary
examples/runtime/hello_world/README.md Added detailed documentation for the "hello world" Dynamo runtime example, including architecture, usage, and deployment.
examples/runtime/hello_world/hello_world.py Introduced async backend service with a streaming endpoint and worker setup using Dynamo runtime.
examples/runtime/hello_world/client.py Added async client worker script that interacts with the backend service and prints streamed responses.
examples/runtime/hello_world/deploy/hello_world.yaml Added Kubernetes CRD YAML for deploying frontend and worker services with health/resource probes and configuration.

Sequence Diagram(s)

sequenceDiagram
    participant ClientWorker
    participant DynamoRuntime
    participant BackendService

    ClientWorker->>DynamoRuntime: Connect and discover backend endpoint
    DynamoRuntime->>BackendService: Route request to content_generator
    ClientWorker->>BackendService: Send request ("world,sun,moon,star")
    loop For each word in request
        BackendService-->>ClientWorker: Stream greeting ("Hello, word!")
    end
    ClientWorker->>ClientWorker: Print each streamed greeting
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

In the meadow of code where the streams flow free,
A hello to the world, as bright as can be!
With clients and workers, in clusters they sing,
Kubernetes and YAML—what joy they bring!
Each greeting a hop, each stream a delight,
The rabbit applauds: the future is bright! 🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
examples/runtime/hello_world/client.py (1)

23-37: Well-implemented client worker with suggested error handling improvement

The implementation correctly follows the Dynamo runtime patterns:

  • Proper use of @dynamo_worker() decorator
  • Correct endpoint access pattern matching the backend service
  • Appropriate async streaming handling

Consider adding error handling around client operations for production robustness:

 @dynamo_worker()
 async def worker(runtime: DistributedRuntime):
     # Get endpoint
     endpoint = (
         runtime.namespace("hello_world").component("backend").endpoint("generate")
     )

-    # Create client and wait for service to be ready
-    client = await endpoint.client()
-    await client.wait_for_instances()
-
-    # Issue request and process the stream
-    stream = await client.generate("world,sun,moon,star")
-    async for response in stream:
-        print(response.data())
+    try:
+        # Create client and wait for service to be ready
+        client = await endpoint.client()
+        await client.wait_for_instances()
+
+        # Issue request and process the stream
+        stream = await client.generate("world,sun,moon,star")
+        async for response in stream:
+            print(response.data())
+    except Exception as e:
+        print(f"Error connecting to or processing stream: {e}")
+        raise
examples/runtime/hello_world/README.md (2)

49-79: Clear setup instructions with minor formatting fix needed

The prerequisites and setup instructions are comprehensive and accurate.

Consider fixing the markdown list indentation:

- Before running this example, ensure you have the following services running:
-
- - **etcd**: A distributed key-value store used for service discovery and metadata storage
- - **NATS**: A high-performance message broker for inter-component communication
+ Before running this example, ensure you have the following services running:
+
+- **etcd**: A distributed key-value store used for service discovery and metadata storage
+- **NATS**: A high-performance message broker for inter-component communication

104-113: Good deployment instructions with minor heading fix

The Kubernetes deployment instructions are clear and follow best practices with environment variables.

Fix the heading punctuation:

-## Deployment to Kubernetes.
+## Deployment to Kubernetes
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2874fd and 49b2fa1.

📒 Files selected for processing (4)
  • examples/runtime/hello_world/README.md (1 hunks)
  • examples/runtime/hello_world/client.py (1 hunks)
  • examples/runtime/hello_world/deploy/hello_world.yaml (1 hunks)
  • examples/runtime/hello_world/hello_world.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
examples/runtime/hello_world/client.py (1)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

examples/runtime/hello_world/README.md (1)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

examples/runtime/hello_world/hello_world.py (1)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

examples/runtime/hello_world/deploy/hello_world.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

🧬 Code Graph Analysis (1)
examples/runtime/hello_world/client.py (3)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (30-53)
lib/bindings/python/src/dynamo/runtime/__init__.py (1)
  • dynamo_worker (34-60)
examples/runtime/hello_world/hello_world.py (1)
  • worker (37-51)
🪛 LanguageTool
examples/runtime/hello_world/README.md

[style] ~36-~36: Consider using “who” when you are referring to a person instead of an object.
Context: ...ated word - Client: A Dynamo worker that connects to and sends requests to the b...

(THAT_WHO)


[style] ~98-~98: Consider using “who” when you are referring to a person instead of an object.
Context: ...eetings - worker: A dynamo worker that sets up the service, creates the endpoi...

(THAT_WHO)


[style] ~102-~102: Consider using “who” when you are referring to a person instead of an object.
Context: ...t.py) - **worker`**: A dynamo worker that connects to the backend service and pro...

(THAT_WHO)

🪛 markdownlint-cli2 (0.17.2)
examples/runtime/hello_world/README.md

55-55: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


56-56: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


104-104: Trailing punctuation in heading
Punctuation: '.'

(MD026, no-trailing-punctuation)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (14)
examples/runtime/hello_world/client.py (3)

1-15: LGTM - Standard license header

The SPDX license header follows project conventions correctly.


16-21: Good import choices

The imports are well-organized and appropriate. Using uvloop is a good performance optimization for async applications.


40-42: Correct async execution pattern

The main block properly installs uvloop before running the async worker, following best practices for high-performance async applications.

examples/runtime/hello_world/hello_world.py (5)

1-15: LGTM - Standard license header

Consistent license header following project conventions.


16-26: Well-organized imports and logging setup

Good use of Dynamo's logging configuration with a specific service name for better observability.


28-33: Excellent streaming endpoint implementation

The endpoint correctly demonstrates:

  • Proper @dynamo_endpoint(str, str) decorator usage with type annotations
  • Async generator pattern for streaming responses
  • Good logging for observability
  • Realistic delay simulation for demonstration purposes

36-51: Proper service setup and endpoint serving

The worker correctly implements the Dynamo service lifecycle:

  • Proper namespace/component/endpoint hierarchy
  • Good logging for debugging and monitoring
  • Correct use of serve_endpoint with the generator function

54-56: Consistent async execution pattern

Matches the client implementation with proper uvloop installation and async execution.

examples/runtime/hello_world/README.md (5)

1-16: Standard license header for markdown

Proper license header format for documentation files.


18-31: Clear introduction and architecture overview

The architecture diagram effectively illustrates the simple client-backend relationship and correctly identifies the streaming endpoint.


33-48: Comprehensive coverage of key concepts

The implementation details section effectively covers all the important Dynamo runtime concepts demonstrated in the example code.


81-91: Accurate expected output for user verification

The output example correctly matches what the streaming service would produce, helping users verify their setup.


93-103: Accurate code structure documentation

The descriptions of the backend and client components are precise and help users understand the codebase organization.

examples/runtime/hello_world/deploy/hello_world.yaml (1)

1-8: Proper CRD definition and metadata

The license header and Kubernetes resource definition follow correct conventions for DynamoGraphDeployment.

Comment on lines 9 to 47
services:
Frontend:
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
readinessProbe:
exec:
command:
- /bin/sh
- -c
- 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""'
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
dynamoNamespace: hello-world
componentType: main
replicas: 1
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "1"
memory: "2Gi"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-233.17
workingDir: /workspace/examples/runtime/hello_world/
command:
- /bin/sh
- -c
args:
- "python3 client.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix probe configuration for client service

The Frontend service configuration has probe mismatches with the actual client implementation:

  1. Client doesn't expose HTTP endpoints: The client.py script is a simple worker that connects to the backend, processes responses, and exits. It doesn't run an HTTP server.

  2. Inconsistent probe types: Liveness probe uses httpGet while readiness probe uses exec with curl - both targeting the same non-existent endpoint.

Consider this configuration for a client that runs once and exits:

   Frontend:
-    livenessProbe:
-      httpGet:
-        path: /health
-        port: 8000
-      initialDelaySeconds: 60
-      periodSeconds: 60
-      timeoutSeconds: 30
-      failureThreshold: 10
-    readinessProbe:
-      exec:
-        command:
-          - /bin/sh
-          - -c
-          - 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""'
-      initialDelaySeconds: 60
-      periodSeconds: 60
-      timeoutSeconds: 30
-      failureThreshold: 10
+    # Client runs once and exits - using exec probes for basic health
+    livenessProbe:
+      exec:
+        command:
+          - /bin/sh
+          - -c
+          - "pgrep -f 'python3 client.py' || exit 0"
+      initialDelaySeconds: 10
+      periodSeconds: 30
+      timeoutSeconds: 10
+      failureThreshold: 3
+    readinessProbe:
+      exec:
+        command:
+          - /bin/sh
+          - -c
+          - "pgrep -f 'python3 client.py' || exit 0"
+      initialDelaySeconds: 5
+      periodSeconds: 10
+      timeoutSeconds: 5
+      failureThreshold: 3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
services:
Frontend:
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
readinessProbe:
exec:
command:
- /bin/sh
- -c
- 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""'
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
dynamoNamespace: hello-world
componentType: main
replicas: 1
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "1"
memory: "2Gi"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-233.17
workingDir: /workspace/examples/runtime/hello_world/
command:
- /bin/sh
- -c
args:
- "python3 client.py"
services:
Frontend:
# Client runs once and exits – using exec probes for basic health
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f 'python3 client.py' || exit 1"
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f 'python3 client.py' || exit 1"
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
dynamoNamespace: hello-world
componentType: main
replicas: 1
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "1"
memory: "2Gi"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-233.17
workingDir: /workspace/examples/runtime/hello_world/
command:
- /bin/sh
- -c
args:
- "python3 client.py"
🤖 Prompt for AI Agents
In examples/runtime/hello_world/deploy/hello_world.yaml from lines 9 to 47, the
Frontend service probes are incorrectly configured to use HTTP GET and curl
commands targeting a non-existent health endpoint, but the client.py script does
not run an HTTP server. To fix this, remove or disable the liveness and
readiness probes since the client is a one-time worker process that exits after
execution, or replace them with appropriate exec probes that check the process
status if needed.

Comment on lines 48 to 88
HelloWorldWorker:
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "exit 0"
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
readinessProbe:
exec:
command:
- /bin/sh
- -c
- 'grep "Hello" /tmp/hello_world.log'
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 10
dynamoNamespace: hello-world
componentType: worker
replicas: 1
resources:
requests:
cpu: "10"
memory: "20Gi"
gpu: "1"
limits:
cpu: "10"
memory: "20Gi"
gpu: "1"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-233.17
workingDir: /workspace/examples/runtime/hello_world/
command:
- /bin/sh
- -c
args:
- python3 hello_world.py 2>&1 | tee /tmp/hello_world.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix probe configuration and resource allocation for backend service

The HelloWorldWorker service has several configuration issues:

  1. Ineffective liveness probe: Using exit 0 always succeeds, providing no actual health checking.

  2. Incorrect readiness probe: Checking for "Hello" in logs will fail until the first client request is processed, but the service should be ready when it starts serving the endpoint.

  3. Excessive resource allocation: 10 CPU, 20Gi memory, and 1 GPU is overkill for a simple greeting service that doesn't use GPU.

Consider this improved configuration:

   HelloWorldWorker:
     livenessProbe:
       exec:
         command:
           - /bin/sh
           - -c
-          - "exit 0"
+          - "pgrep -f 'python3 hello_world.py'"
-      periodSeconds: 60
-      timeoutSeconds: 30
-      failureThreshold: 10
+      initialDelaySeconds: 30
+      periodSeconds: 30
+      timeoutSeconds: 10
+      failureThreshold: 3
     readinessProbe:
       exec:
         command:
           - /bin/sh
           - -c
-          - 'grep "Hello" /tmp/hello_world.log'
+          - 'grep -q "Serving endpoint generate" /tmp/hello_world.log'
-      initialDelaySeconds: 60
-      periodSeconds: 60
-      timeoutSeconds: 30
-      failureThreshold: 10
+      initialDelaySeconds: 15
+      periodSeconds: 10
+      timeoutSeconds: 5
+      failureThreshold: 3
     resources:
       requests:
-        cpu: "10"
-        memory: "20Gi"
-        gpu: "1"
+        cpu: "1"
+        memory: "2Gi"
       limits:
-        cpu: "10"
-        memory: "20Gi"
-        gpu: "1"
+        cpu: "2"
+        memory: "4Gi"
🤖 Prompt for AI Agents
In examples/runtime/hello_world/deploy/hello_world.yaml lines 48 to 88, fix the
HelloWorldWorker configuration by replacing the liveness probe command with a
meaningful health check that verifies the service is responsive instead of
always exiting 0. Change the readiness probe to check the actual service
endpoint readiness rather than searching logs for "Hello" to ensure it reflects
true readiness from startup. Reduce resource requests and limits to minimal CPU
and memory values appropriate for a simple greeting service and remove the GPU
allocation since it is not used. Adjust these settings to provide accurate
health checks and efficient resource usage.

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Aug 24, 2025
@github-actions
Copy link

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

@github-actions github-actions bot closed this Aug 29, 2025
@github-actions github-actions bot deleted the DYN-704-hello-world branch August 29, 2025 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants