-
Notifications
You must be signed in to change notification settings - Fork 680
feat: hello world deploy example #2094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change introduces a new "hello world" example for Dynamo's distributed runtime, including a backend service, a client script, Kubernetes deployment configuration, and comprehensive documentation. The additions demonstrate a streaming endpoint, client-server interaction, deployment instructions, and health/resource management for both local and Kubernetes environments. Changes
Sequence Diagram(s)sequenceDiagram
participant ClientWorker
participant DynamoRuntime
participant BackendService
ClientWorker->>DynamoRuntime: Connect and discover backend endpoint
DynamoRuntime->>BackendService: Route request to content_generator
ClientWorker->>BackendService: Send request ("world,sun,moon,star")
loop For each word in request
BackendService-->>ClientWorker: Stream greeting ("Hello, word!")
end
ClientWorker->>ClientWorker: Print each streamed greeting
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
examples/runtime/hello_world/client.py (1)
23-37: Well-implemented client worker with suggested error handling improvementThe implementation correctly follows the Dynamo runtime patterns:
- Proper use of
@dynamo_worker()decorator- Correct endpoint access pattern matching the backend service
- Appropriate async streaming handling
Consider adding error handling around client operations for production robustness:
@dynamo_worker() async def worker(runtime: DistributedRuntime): # Get endpoint endpoint = ( runtime.namespace("hello_world").component("backend").endpoint("generate") ) - # Create client and wait for service to be ready - client = await endpoint.client() - await client.wait_for_instances() - - # Issue request and process the stream - stream = await client.generate("world,sun,moon,star") - async for response in stream: - print(response.data()) + try: + # Create client and wait for service to be ready + client = await endpoint.client() + await client.wait_for_instances() + + # Issue request and process the stream + stream = await client.generate("world,sun,moon,star") + async for response in stream: + print(response.data()) + except Exception as e: + print(f"Error connecting to or processing stream: {e}") + raiseexamples/runtime/hello_world/README.md (2)
49-79: Clear setup instructions with minor formatting fix neededThe prerequisites and setup instructions are comprehensive and accurate.
Consider fixing the markdown list indentation:
- Before running this example, ensure you have the following services running: - - - **etcd**: A distributed key-value store used for service discovery and metadata storage - - **NATS**: A high-performance message broker for inter-component communication + Before running this example, ensure you have the following services running: + +- **etcd**: A distributed key-value store used for service discovery and metadata storage +- **NATS**: A high-performance message broker for inter-component communication
104-113: Good deployment instructions with minor heading fixThe Kubernetes deployment instructions are clear and follow best practices with environment variables.
Fix the heading punctuation:
-## Deployment to Kubernetes. +## Deployment to Kubernetes
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/runtime/hello_world/README.md(1 hunks)examples/runtime/hello_world/client.py(1 hunks)examples/runtime/hello_world/deploy/hello_world.yaml(1 hunks)examples/runtime/hello_world/hello_world.py(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
examples/runtime/hello_world/client.py (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
examples/runtime/hello_world/README.md (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
examples/runtime/hello_world/hello_world.py (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
examples/runtime/hello_world/deploy/hello_world.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
🧬 Code Graph Analysis (1)
examples/runtime/hello_world/client.py (3)
lib/bindings/python/src/dynamo/_core.pyi (1)
DistributedRuntime(30-53)lib/bindings/python/src/dynamo/runtime/__init__.py (1)
dynamo_worker(34-60)examples/runtime/hello_world/hello_world.py (1)
worker(37-51)
🪛 LanguageTool
examples/runtime/hello_world/README.md
[style] ~36-~36: Consider using “who” when you are referring to a person instead of an object.
Context: ...ated word - Client: A Dynamo worker that connects to and sends requests to the b...
(THAT_WHO)
[style] ~98-~98: Consider using “who” when you are referring to a person instead of an object.
Context: ...eetings - worker: A dynamo worker that sets up the service, creates the endpoi...
(THAT_WHO)
[style] ~102-~102: Consider using “who” when you are referring to a person instead of an object.
Context: ...t.py) - **worker`**: A dynamo worker that connects to the backend service and pro...
(THAT_WHO)
🪛 markdownlint-cli2 (0.17.2)
examples/runtime/hello_world/README.md
55-55: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
56-56: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
104-104: Trailing punctuation in heading
Punctuation: '.'
(MD026, no-trailing-punctuation)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (14)
examples/runtime/hello_world/client.py (3)
1-15: LGTM - Standard license headerThe SPDX license header follows project conventions correctly.
16-21: Good import choicesThe imports are well-organized and appropriate. Using
uvloopis a good performance optimization for async applications.
40-42: Correct async execution patternThe main block properly installs uvloop before running the async worker, following best practices for high-performance async applications.
examples/runtime/hello_world/hello_world.py (5)
1-15: LGTM - Standard license headerConsistent license header following project conventions.
16-26: Well-organized imports and logging setupGood use of Dynamo's logging configuration with a specific service name for better observability.
28-33: Excellent streaming endpoint implementationThe endpoint correctly demonstrates:
- Proper
@dynamo_endpoint(str, str)decorator usage with type annotations- Async generator pattern for streaming responses
- Good logging for observability
- Realistic delay simulation for demonstration purposes
36-51: Proper service setup and endpoint servingThe worker correctly implements the Dynamo service lifecycle:
- Proper namespace/component/endpoint hierarchy
- Good logging for debugging and monitoring
- Correct use of
serve_endpointwith the generator function
54-56: Consistent async execution patternMatches the client implementation with proper uvloop installation and async execution.
examples/runtime/hello_world/README.md (5)
1-16: Standard license header for markdownProper license header format for documentation files.
18-31: Clear introduction and architecture overviewThe architecture diagram effectively illustrates the simple client-backend relationship and correctly identifies the streaming endpoint.
33-48: Comprehensive coverage of key conceptsThe implementation details section effectively covers all the important Dynamo runtime concepts demonstrated in the example code.
81-91: Accurate expected output for user verificationThe output example correctly matches what the streaming service would produce, helping users verify their setup.
93-103: Accurate code structure documentationThe descriptions of the backend and client components are precise and help users understand the codebase organization.
examples/runtime/hello_world/deploy/hello_world.yaml (1)
1-8: Proper CRD definition and metadataThe license header and Kubernetes resource definition follow correct conventions for DynamoGraphDeployment.
| services: | ||
| Frontend: | ||
| livenessProbe: | ||
| httpGet: | ||
| path: /health | ||
| port: 8000 | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 60 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 10 | ||
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""' | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 60 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 10 | ||
| dynamoNamespace: hello-world | ||
| componentType: main | ||
| replicas: 1 | ||
| resources: | ||
| requests: | ||
| cpu: "1" | ||
| memory: "2Gi" | ||
| limits: | ||
| cpu: "1" | ||
| memory: "2Gi" | ||
| extraPodSpec: | ||
| mainContainer: | ||
| image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-233.17 | ||
| workingDir: /workspace/examples/runtime/hello_world/ | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| args: | ||
| - "python3 client.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix probe configuration for client service
The Frontend service configuration has probe mismatches with the actual client implementation:
-
Client doesn't expose HTTP endpoints: The
client.pyscript is a simple worker that connects to the backend, processes responses, and exits. It doesn't run an HTTP server. -
Inconsistent probe types: Liveness probe uses
httpGetwhile readiness probe usesexecwith curl - both targeting the same non-existent endpoint.
Consider this configuration for a client that runs once and exits:
Frontend:
- livenessProbe:
- httpGet:
- path: /health
- port: 8000
- initialDelaySeconds: 60
- periodSeconds: 60
- timeoutSeconds: 30
- failureThreshold: 10
- readinessProbe:
- exec:
- command:
- - /bin/sh
- - -c
- - 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""'
- initialDelaySeconds: 60
- periodSeconds: 60
- timeoutSeconds: 30
- failureThreshold: 10
+ # Client runs once and exits - using exec probes for basic health
+ livenessProbe:
+ exec:
+ command:
+ - /bin/sh
+ - -c
+ - "pgrep -f 'python3 client.py' || exit 0"
+ initialDelaySeconds: 10
+ periodSeconds: 30
+ timeoutSeconds: 10
+ failureThreshold: 3
+ readinessProbe:
+ exec:
+ command:
+ - /bin/sh
+ - -c
+ - "pgrep -f 'python3 client.py' || exit 0"
+ initialDelaySeconds: 5
+ periodSeconds: 10
+ timeoutSeconds: 5
+ failureThreshold: 3📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| services: | |
| Frontend: | |
| livenessProbe: | |
| httpGet: | |
| path: /health | |
| port: 8000 | |
| initialDelaySeconds: 60 | |
| periodSeconds: 60 | |
| timeoutSeconds: 30 | |
| failureThreshold: 10 | |
| readinessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - 'curl -s http://localhost:8000/health | jq -e ".status == \"healthy\""' | |
| initialDelaySeconds: 60 | |
| periodSeconds: 60 | |
| timeoutSeconds: 30 | |
| failureThreshold: 10 | |
| dynamoNamespace: hello-world | |
| componentType: main | |
| replicas: 1 | |
| resources: | |
| requests: | |
| cpu: "1" | |
| memory: "2Gi" | |
| limits: | |
| cpu: "1" | |
| memory: "2Gi" | |
| extraPodSpec: | |
| mainContainer: | |
| image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-233.17 | |
| workingDir: /workspace/examples/runtime/hello_world/ | |
| command: | |
| - /bin/sh | |
| - -c | |
| args: | |
| - "python3 client.py" | |
| services: | |
| Frontend: | |
| # Client runs once and exits – using exec probes for basic health | |
| livenessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - "pgrep -f 'python3 client.py' || exit 1" | |
| initialDelaySeconds: 10 | |
| periodSeconds: 30 | |
| timeoutSeconds: 10 | |
| failureThreshold: 3 | |
| readinessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - "pgrep -f 'python3 client.py' || exit 1" | |
| initialDelaySeconds: 5 | |
| periodSeconds: 10 | |
| timeoutSeconds: 5 | |
| failureThreshold: 3 | |
| dynamoNamespace: hello-world | |
| componentType: main | |
| replicas: 1 | |
| resources: | |
| requests: | |
| cpu: "1" | |
| memory: "2Gi" | |
| limits: | |
| cpu: "1" | |
| memory: "2Gi" | |
| extraPodSpec: | |
| mainContainer: | |
| image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-233.17 | |
| workingDir: /workspace/examples/runtime/hello_world/ | |
| command: | |
| - /bin/sh | |
| - -c | |
| args: | |
| - "python3 client.py" |
🤖 Prompt for AI Agents
In examples/runtime/hello_world/deploy/hello_world.yaml from lines 9 to 47, the
Frontend service probes are incorrectly configured to use HTTP GET and curl
commands targeting a non-existent health endpoint, but the client.py script does
not run an HTTP server. To fix this, remove or disable the liveness and
readiness probes since the client is a one-time worker process that exits after
execution, or replace them with appropriate exec probes that check the process
status if needed.
| HelloWorldWorker: | ||
| livenessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - "exit 0" | ||
| periodSeconds: 60 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 10 | ||
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - 'grep "Hello" /tmp/hello_world.log' | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 60 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 10 | ||
| dynamoNamespace: hello-world | ||
| componentType: worker | ||
| replicas: 1 | ||
| resources: | ||
| requests: | ||
| cpu: "10" | ||
| memory: "20Gi" | ||
| gpu: "1" | ||
| limits: | ||
| cpu: "10" | ||
| memory: "20Gi" | ||
| gpu: "1" | ||
| extraPodSpec: | ||
| mainContainer: | ||
| image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-233.17 | ||
| workingDir: /workspace/examples/runtime/hello_world/ | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| args: | ||
| - python3 hello_world.py 2>&1 | tee /tmp/hello_world.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix probe configuration and resource allocation for backend service
The HelloWorldWorker service has several configuration issues:
-
Ineffective liveness probe: Using
exit 0always succeeds, providing no actual health checking. -
Incorrect readiness probe: Checking for "Hello" in logs will fail until the first client request is processed, but the service should be ready when it starts serving the endpoint.
-
Excessive resource allocation: 10 CPU, 20Gi memory, and 1 GPU is overkill for a simple greeting service that doesn't use GPU.
Consider this improved configuration:
HelloWorldWorker:
livenessProbe:
exec:
command:
- /bin/sh
- -c
- - "exit 0"
+ - "pgrep -f 'python3 hello_world.py'"
- periodSeconds: 60
- timeoutSeconds: 30
- failureThreshold: 10
+ initialDelaySeconds: 30
+ periodSeconds: 30
+ timeoutSeconds: 10
+ failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- - 'grep "Hello" /tmp/hello_world.log'
+ - 'grep -q "Serving endpoint generate" /tmp/hello_world.log'
- initialDelaySeconds: 60
- periodSeconds: 60
- timeoutSeconds: 30
- failureThreshold: 10
+ initialDelaySeconds: 15
+ periodSeconds: 10
+ timeoutSeconds: 5
+ failureThreshold: 3
resources:
requests:
- cpu: "10"
- memory: "20Gi"
- gpu: "1"
+ cpu: "1"
+ memory: "2Gi"
limits:
- cpu: "10"
- memory: "20Gi"
- gpu: "1"
+ cpu: "2"
+ memory: "4Gi"🤖 Prompt for AI Agents
In examples/runtime/hello_world/deploy/hello_world.yaml lines 48 to 88, fix the
HelloWorldWorker configuration by replacing the liveness probe command with a
meaningful health check that verifies the service is responsive instead of
always exiting 0. Change the readiness probe to check the actual service
endpoint readiness rather than searching logs for "Hello" to ensure it reflects
true readiness from startup. Reduce resource requests and limits to minimal CPU
and memory values appropriate for a simple greeting service and remove the GPU
allocation since it is not used. Adjust these settings to provide accurate
health checks and efficient resource usage.
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information. |
Overview:
DYN-704
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation