- 
                Notifications
    You must be signed in to change notification settings 
- Fork 663
feat: Base metrics: add generic ingress handler metrics #2090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Base metrics: add generic ingress handler metrics #2090
Conversation
| WalkthroughThis set of changes introduces a generic Prometheus profiling system for ingress handlers, refactors endpoint and router wiring to enable automatic metrics integration, and adds new metrics types and accessors. The system_metrics example is overhauled to showcase these capabilities, and new HTTP server info accessors are added to the runtime. Documentation and configuration updates accompany the code changes. Changes
 Sequence Diagram(s)sequenceDiagram
    participant Client
    participant Ingress
    participant Metrics
    participant Handler
    Client->>Ingress: Send request payload
    Ingress->>Metrics: Increment request counters, start timer
    Ingress->>Handler: Process request
    Handler-->>Ingress: Stream response(s)
    Ingress->>Metrics: Record bytes, errors, durations
    Ingress-->>Client: Send response(s)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~35 minutes Possibly related PRs
 Poem
 Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit: 
 SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
 Other keywords and placeholders
 Documentation and Community
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
deploy/metrics/prometheus.yml (1)
37-41: Update the comment to reflect the new port number.The comment on line 37 still references port 8000, but the actual target port has been changed to 8080. This inconsistency could confuse users setting up the demo service.
- # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp + # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8080/tcplib/runtime/src/pipeline/network.rs (1)
379-398: Consider more graceful error handling inwith_metricsThe implementation is solid, but the
unwrap()on line 381 could panic if metrics are somehow already set. While this is unlikely during normal initialization, consider usingexpect()with a descriptive message or returning a Result.- pub fn with_metrics(metrics: IngressMetrics) -> Arc<Self> { - let ingress = Self::new(); - ingress.metrics.set(Arc::new(metrics)).unwrap(); - ingress - } + pub fn with_metrics(metrics: IngressMetrics) -> Arc<Self> { + let ingress = Self::new(); + ingress.metrics.set(Arc::new(metrics)) + .expect("Metrics should not be already set when creating new Ingress"); + ingress + }lib/runtime/examples/system_metrics/src/lib.rs (2)
32-35: Consider documenting the purpose of MyStats structThe
MyStatsstruct appears to be used for the stats handler but its purpose and the meaning of thevalfield are unclear.-#[derive(serde::Serialize, serde::Deserialize, Debug, Clone)] -pub struct MyStats { - pub val: i32, -} +/// Stats structure returned by the endpoint's stats handler +#[derive(serde::Serialize, serde::Deserialize, Debug, Clone)] +pub struct MyStats { + /// Example value for demonstration purposes + pub val: i32, +}
117-122: Consider making the stats handler more meaningfulThe stats handler currently returns a fixed value which doesn't demonstrate real metrics. Consider returning actual system stats or metrics data to make the example more realistic.
- .stats_handler(|_stats| { - println!("Stats handler called with stats: {:?}", _stats); - let stats = MyStats { val: 10 }; - serde_json::to_value(stats).unwrap() - }) + .stats_handler(|stats| { + println!("Stats handler called with stats: {:?}", stats); + // Return actual runtime stats or a more meaningful example + let my_stats = MyStats { + val: stats.get("request_count").and_then(|v| v.as_i64()).unwrap_or(0) as i32 + }; + serde_json::to_value(my_stats).unwrap() + })
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
- lib/runtime/examples/Cargo.lockis excluded by- !**/*.lock
📒 Files selected for processing (14)
- components/router/src/main.rs(1 hunks)
- deploy/metrics/prometheus.yml(1 hunks)
- lib/llm/src/kv_router/publisher.rs(2 hunks)
- lib/runtime/examples/system_metrics/Cargo.toml(2 hunks)
- lib/runtime/examples/system_metrics/README.md(1 hunks)
- lib/runtime/examples/system_metrics/src/bin/system_client.rs(2 hunks)
- lib/runtime/examples/system_metrics/src/bin/system_server.rs(2 hunks)
- lib/runtime/examples/system_metrics/src/lib.rs(1 hunks)
- lib/runtime/src/distributed.rs(4 hunks)
- lib/runtime/src/http_server.rs(1 hunks)
- lib/runtime/src/lib.rs(3 hunks)
- lib/runtime/src/metrics.rs(5 hunks)
- lib/runtime/src/pipeline/network.rs(3 hunks)
- lib/runtime/src/pipeline/network/ingress/push_handler.rs(7 hunks)
🧰 Additional context used
🧠 Learnings (10)
lib/runtime/src/lib.rs (4)
Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.
Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: PeaBrane
PR: #1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
components/router/src/main.rs (2)
Learnt from: alec-flowers
PR: #1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions create_stored_blocks and create_stored_block_from_parts are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.
lib/runtime/src/distributed.rs (5)
Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.
Learnt from: PeaBrane
PR: #1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/runtime/examples/system_metrics/src/bin/system_server.rs (3)
Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.
Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/runtime/examples/system_metrics/src/bin/system_client.rs (3)
Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.
Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.
Learnt from: PeaBrane
PR: #1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
lib/runtime/src/http_server.rs (1)
Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/runtime/examples/system_metrics/Cargo.toml (1)
Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/llm/src/kv_router/publisher.rs (7)
Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.
Learnt from: alec-flowers
PR: #1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions create_stored_blocks and create_stored_block_from_parts are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: PeaBrane
PR: #1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
Learnt from: oandreeva-nv
PR: #1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets None. The start_batching_publisher function in lib/llm/tests/block_manager.rs demonstrates this pattern: when the KVBMDynamoRuntimeComponent is dropped, its batch_tx sender is dropped, causing rx.recv() to return None, which triggers cleanup and task termination.
Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: PeaBrane
PR: #1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.
lib/runtime/examples/system_metrics/src/lib.rs (3)
Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.
Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/runtime/src/pipeline/network/ingress/push_handler.rs (3)
Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::error::RequestErrorKind::NoResponders. Use err.downcast_ref::<async_nats::error::RequestError>() and then check req_err.kind() against RequestErrorKind::NoResponders to handle this error properly.
Learnt from: t-ob
PR: #1290
File: launch/dynamo-run/src/subprocess/sglang_inc.py:80-110
Timestamp: 2025-06-03T10:17:51.711Z
Learning: The sglang async_encode method does not support streaming options, so collecting all embeddings before yielding is the correct approach for embedding requests.
Learnt from: oandreeva-nv
PR: #1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets None. The start_batching_publisher function in lib/llm/tests/block_manager.rs demonstrates this pattern: when the KVBMDynamoRuntimeComponent is dropped, its batch_tx sender is dropped, causing rx.recv() to return None, which triggers cleanup and task termination.
🧬 Code Graph Analysis (4)
lib/runtime/examples/system_metrics/src/bin/system_server.rs (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
DistributedRuntime(30-53)lib/runtime/examples/system_metrics/src/lib.rs (1)
backend(95-128)
lib/runtime/examples/system_metrics/src/bin/system_client.rs (2)
lib/runtime/src/component.rs (5)
component(346-348)
component(489-495)
namespace(209-211)
namespace(498-505)
client(423-429)lib/runtime/src/distributed.rs (1)
namespace(173-175)
lib/runtime/src/http_server.rs (1)
lib/runtime/src/distributed.rs (1)
new(46-141)
lib/runtime/src/pipeline/network.rs (5)
lib/runtime/src/distributed.rs (1)
new(46-141)lib/runtime/examples/system_metrics/src/lib.rs (2)
from_endpoint(44-56)
with_metrics(65-69)lib/bindings/python/src/dynamo/_core.pyi (1)
Endpoint(212-236)lib/runtime/src/pipeline/nodes/sinks/segment.rs (1)
attach(24-28)lib/runtime/tests/common/mock.rs (1)
segment(305-309)
🔇 Additional comments (31)
lib/runtime/examples/system_metrics/src/bin/system_client.rs (2)
17-17: Good refactoring to use centralized constants.The addition of
DEFAULT_COMPONENTandDEFAULT_ENDPOINTto the imports improves maintainability by avoiding magic strings.
34-36: LGTM: Consistent use of configuration constants.Replacing hardcoded strings with the imported constants (
DEFAULT_COMPONENTandDEFAULT_ENDPOINT) improves maintainability and reduces the risk of typos.lib/runtime/examples/system_metrics/Cargo.toml (2)
25-27: LGTM: Well-documented feature flags.The feature separation allows for running tests without external dependencies, while the
integrationfeature clearly indicates NATS requirement.
39-42: Appropriate dev dependencies for testing.The selected dev dependencies (
reqwestwith JSON support,tokio-test, andrand) are well-suited for testing HTTP endpoints and async code.lib/runtime/src/lib.rs (3)
23-23: LGTM: Adding OnceLock for lazy initialization.The addition of
OnceLocksupports the new HTTP server info storage pattern.
40-40: Good API design: Re-exporting HttpServerInfo.Making
HttpServerInfoavailable at the crate root improves the public API ergonomics.
90-90: Consistent pattern: HTTP server field mirrors TCP server design.The
http_serverfield follows the sameArc<OnceLock<Arc<...>>>pattern as the existingtcp_serverfield, ensuring consistency in the codebase architecture.components/router/src/main.rs (1)
69-82: Excellent integration of ingress metrics functionality.The refactoring correctly integrates the new metrics capabilities by:
- Creating the endpoint first
- Using
Ingress::for_engine_with_metricsinstead of the basicfor_enginemethod- Passing the endpoint reference to enable automatic metrics collection
This change aligns perfectly with the PR objectives of adding generic ingress handler metrics.
lib/runtime/src/distributed.rs (4)
27-27: LGTM: Standard library importThe import of
std::sync::OnceLockis correctly added to support the new HTTP server information storage functionality.
95-95: LGTM: Proper initialization of thread-safe storageThe field initialization using
Arc::new(OnceLock::new())follows the same pattern as the existingtcp_serverfield and provides thread-safe, single-initialization semantics.
119-130: LGTM: Correct HTTP server information storageThe server information storage logic properly:
- Captures both the socket address and task handle from
spawn_http_server- Creates an
HttpServerInfoinstance with the captured data- Uses
OnceLock::set()with appropriate error handling for single initialization- Maintains consistency with the existing server management patterns
218-221: LGTM: Clean accessor method implementationThe
http_server_info()method provides a clean API to access the stored HTTP server information, returningOption<Arc<HttpServerInfo>>which properly handles the case where the HTTP server may not be initialized.lib/runtime/src/pipeline/network/ingress/push_handler.rs (4)
26-32: LGTM: Proper request metrics initializationThe metrics collection is correctly initialized at the start of request processing, capturing:
- Request start time for duration calculation
- Request counter increment
- Concurrent request tracking
- Request byte size measurement
The conditional check ensures metrics are only collected when available.
52-56: Excellent error categorization for observabilityThe error metrics implementation provides detailed categorization:
deserialization: JSON parsing failures
invalid_message: Unexpected message format
response_stream: TCP connection issues
generate: Engine generation failuresThis granular labeling will enable precise monitoring and debugging of different failure modes.
Also applies to: 66-70, 89-93, 105-108
138-140: LGTM: Response metrics and error trackingThe response handling correctly tracks:
- Response byte sizes for both regular responses and completion markers
- Publish failures with appropriate error labels (
publish_response,publish_final)This provides complete visibility into the data flow and potential transmission issues.
Also applies to: 145-149, 160-162, 168-170
174-178: LGTM: Proper request completion metricsThe request completion logic correctly:
- Records the total request duration using the captured start time
- Decrements the concurrent request counter
- Maintains proper cleanup regardless of success/failure
This ensures accurate duration tracking and prevents metric leaks.
lib/runtime/src/http_server.rs (2)
22-22: LGTM: Required import for task handleThe import of
tokio::task::JoinHandleis correctly added to support the optional task handle field inHttpServerInfo.
26-61: Well-designed HTTP server information structThe
HttpServerInfoimplementation is clean and comprehensive:Strengths:
- Encapsulates both socket address and optional task handle
- Constructor properly wraps
JoinHandleinArcfor shared ownership- Provides convenient accessor methods for address components
- Manual
Cloneimplementation correctly handles theArc<JoinHandle<()>>fieldDesign considerations:
- The optional handle allows flexibility for different server management scenarios
- The struct provides a clean abstraction for HTTP server runtime information
- Consistent with the broader server management patterns in the codebase
lib/llm/src/kv_router/publisher.rs (3)
502-504: LGTM: Efficient endpoint reuse and metrics integrationThe implementation correctly:
- Obtains the endpoint once and reuses it for both handler creation and ingress setup
- Integrates with the new metrics system using
Ingress::for_engine_with_metrics- Passes the endpoint to both the handler constructor and the ingress factory
This is more efficient than the previous approach of calling
component.endpoint()multiple times.
506-506: LGTM: Consistent endpoint usageThe endpoint builder correctly uses the pre-obtained endpoint, maintaining consistency with the metrics-enabled ingress handler.
520-521: LGTM: Future-proofed handler designThe addition of the
endpointfield toKvLoadEndpoingHanderis well-designed:
- The
#[allow(dead_code)]annotation acknowledges the field isn't used yet- The constructor properly accepts and stores the endpoint
- This prepares the handler for potential future endpoint-based functionality
Also applies to: 525-530
lib/runtime/examples/system_metrics/src/bin/system_server.rs (2)
16-17: LGTM: Simplified imports reflect refactored architectureThe imports are now minimal and focused:
- Standard runtime components are imported directly
- The
backendfunction is imported from thesystem_metricscrate library- Removes all the custom metrics and handler imports that were moved to the library
27-27: Excellent demonstration of the simplified APIThe single call to
backend(distributed, None)beautifully demonstrates the new generic profiling system:
- No manual metrics setup required
- No custom handler implementation needed
- The complexity is properly abstracted away in the library
- Shows how easy it is to add comprehensive metrics to an ingress handler
This exemplifies the PR's goal of providing "automatic profiling capabilities" for ingress handlers.
lib/runtime/examples/system_metrics/README.md (1)
1-147: Excellent documentation for the new ingress metrics system!The README provides comprehensive coverage of the new
IngressMetricsfeature with clear examples, benefits, and usage instructions. The structure is logical and the examples effectively demonstrate both automatic profiling and custom metrics integration.lib/runtime/src/pipeline/network.rs (2)
48-128: Well-structured metrics implementation with comprehensive coverage!The
IngressMetricsstruct provides a clean abstraction for ingress profiling with all essential metrics (requests, duration, concurrency, bytes, errors). Thefrom_endpointfactory method elegantly leverages the endpoint's built-in labeling system.
425-437: Clean and intuitive API for metrics-enabled ingress creation!The
for_engine_with_metricsmethod provides an excellent developer experience by combining engine setup with automatic metrics configuration in a single call. The helpermetrics()accessor is also well-designed for optional metrics access.lib/runtime/src/metrics.rs (3)
123-136: Consistent implementation of IntCounterVec support!The
PrometheusMetrictrait implementation forIntCounterVeccorrectly follows the established pattern used by other vector metric types.
285-299: Proper integration of IntCounterVec in the metric creation flow!The handling of
IntCounterVecincreate_metriccorrectly validates parameters and follows the established pattern for vector metric types.
402-417: Breaking change:create_countervecsignature updatedThe
create_countervecmethod now requires two separate parameters—
const_labels: &[&str]
const_label_values: &[(&str, &str)]This improves consistency with other vector metrics but is a breaking API change for any external callers. Our search found only the definition and an internal example in
lib/runtime/src/metrics.rs, so there are no remaining internal call sites to update:
- Definition: lib/runtime/src/metrics.rs (lines 402–417)
- Internal example call: lib/runtime/src/metrics.rs (example in docs/tests)
Please review and update any external consumers of this crate to match the new signature.
lib/runtime/examples/system_metrics/src/lib.rs (2)
37-57: Well-designed custom metrics struct!The
MySystemStatsMetricsimplementation effectively demonstrates how to create custom metrics using the endpoint's built-in metric registry. The use ofArc<IntCounter>ensures thread-safe access.
72-91: Excellent example of integrating custom metrics into a handler!The
RequestHandlerimplementation clearly demonstrates how to incorporate custom metrics tracking alongside the automatic ingress metrics. The optional metrics pattern allows the handler to work with or without metrics.
ff666f5    to
    d9d9420      
    Compare
  
    | great work! hope the examples can have it soon! | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM at a high level - would benefit from a more detailed review on the code changes. CC @paulhendricks @ryanolson @kthui @oandreeva-nv
23909fb    to
    3b3b874      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall - great addition! this will be really useful. thank you!
let's continue to discuss the default naming conventions and how users can select different variants.
it would be a good place to really start solidifying our configuration details using figment to parse, merge and join files/envs/cli-args, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some test code survived and should be removed before this is merged.
| Lgtm | 
Overview:
Adds generic ingress handler metrics with automatic profiling for request count, duration, bytes, and errors.
Details:
In Progress (not in this PR):
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
DIS-299
Summary by CodeRabbit
New Features
IntCounterVec.Improvements
Bug Fixes
Documentation
Refactor