-
Couldn't load subscription status.
- Fork 8
agents: add runtime control for asset collection #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: node-v22.x-nsolid-v6.x
Are you sure you want to change the base?
Conversation
WalkthroughAdds an assetsEnabled runtime flag across proto, agents (gRPC/ZMQ), core, and JS; introduces an EAssetsDisabled error; adds enable/disable helpers for assets and traces; gates asset collection/startup when disabled; and extends tests and test clients to exercise toggles. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant JS as lib/nsolid.js
participant Core as nsolid_api
participant Agent as Agent (gRPC/ZMQ)
participant Prof as Profiler
rect rgb(250,250,255)
note over User,JS: Start/update with assetsEnabled
User->>JS: nsolid.start / nsolid.config (assetsEnabled=X)
JS->>Core: UpdateConfig({assetsEnabled:X})
Core->>Agent: Reconfigure(assetsEnabled:X)
Agent->>Agent: assets_enabled_ = X
end
rect rgb(245,255,245)
note over User,Prof: Asset operation request
User->>JS: heapProfile()/heapSampling()/snapshot()
JS->>Agent: request operation
alt Agent.assets_enabled_ == false
Agent-->>JS: Error (EAssetsDisabled / 500)
JS-->>User: Error
else
Agent->>Prof: start/init
Prof-->>Agent: data/result
Agent-->>JS: Result
JS-->>User: Result
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Points to review closely:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
🔇 Additional comments (2)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
agents/grpc/src/grpc_agent.cc (1)
1710-1760: Propagate assetsEnabled via reconfigure().
reconfigure()ignores the new proto field, so gRPC-driven config updates can’t flip the assets gate and the feature regresses. Please write the field into the outbound JSON before callingUpdateConfig().if (body.has_contcpuprofile()) { out["contCpuProfile"] = body.contcpuprofile(); } + + if (body.has_assetsenabled()) { + out["assetsEnabled"] = body.assetsenabled(); + }
🧹 Nitpick comments (4)
test/parallel/test-nsolid-heap-sampling-stream.js (2)
191-224: Consider refactoring nested callbacks to improve readability.The test uses four levels of nested
setImmediate()callbacks, making the flow difficult to follow. Consider refactoring to async/await or helper functions to flatten the structure.Example refactor using helper functions:
function runHeapSamplingStreamAssetsToggleTests() { // Disable assets through config update nsolid.start({ command: 'localhost:9001', data: 'localhost:9002', assetsEnabled: false, }); setImmediate(() => { testDisabled(() => { testEnabled(() => { testDisabled(() => { testFinalEnabled(); }); }); }); }); } function testDisabled(callback) { assert.throws( () => nsolid.heapSamplingStream(0, 1000), { message: 'Heap sampling could not be started' } ); callback(); } function testEnabled(callback) { nsolid.enableAssets(); setImmediate(() => { let profile = ''; const stream = nsolid.heapSamplingStream(0, 1200); stream.on('data', (chunk) => { profile += chunk; }); stream.on('end', common.mustCall(() => { assert(JSON.parse(profile)); testProfileSchema(JSON.parse(profile)); callback(); })); }); } function testFinalEnabled() { nsolid.enableAssets(); setImmediate(() => { const stream = nsolid.heapSamplingStream(0, 1200); stream.resume(); stream.on('end', common.mustCall(() => { assert.ok(true); })); }); }
193-199: Consider adding memory allocations for consistency.The initial test (lines 128-131) performs allocations to ensure there's data to sample. The subsequent heap sampling calls don't do this, which might lead to less deterministic results. While the tests may still pass, adding allocations would make the behavior more consistent.
Add allocations before each heap sampling operation:
nsolid.enableAssets(); setImmediate(() => { + // Do some allocations to ensure sampling data + const arr = []; + for (let i = 0; i < 1000; i++) { + arr.push(new Array(1000)); + } + let profile = ''; const enabledStream = nsolid.heapSamplingStream(0, 1200);Apply the same pattern before line 216.
Also applies to: 216-221
test/parallel/test-nsolid-profile.js (1)
120-178: Reduce reliance on fixed sleeps; gate on observed config stateThe toggle flow is correct, but hardcoded 100ms delays can flake under slow CI or propagation lag. After each start()/enable/disable step, prefer waiting until nsolid.config.assetsEnabled reflects the expected value before proceeding (or assert it directly), instead of fixed timeouts.
Example pattern:
function waitForAssetsState(expected, deadline = Date.now() + 2000) { return new Promise((resolve, reject) => { (function poll() { if (nsolid.config.assetsEnabled === expected) return resolve(); if (Date.now() > deadline) return reject(new Error('assetsEnabled not applied')); setImmediate(poll); })(); }); } // Usage: nsolid.start({ /* … */ assetsEnabled: false }); await waitForAssetsState(false); // then assert throws, etc.test/parallel/test-nsolid-heap-profile.js (1)
133-200: Consider using async/await instead of nested setTimeout calls.The nested setTimeout pattern (lines 141, 167, 173, 193) makes the test flow harder to follow and maintain. Consider refactoring to use async/await with timer promises:
-function runHeapProfileAssetsToggleTests() { +async function runHeapProfileAssetsToggleTests() { // Disable assets via config update nsolid.start({ command: 'localhost:9001', data: 'localhost:9002', assetsEnabled: false, }); - setTimeout(() => { + await setTimeout(100); assert.throws( () => { nsolid.heapProfile(); }, { message: 'Heap profile could not be started' } ); // ... rest of assertions - }, 100); }This would improve readability and make the test more maintainable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
agents/grpc/proto/reconfigure.proto(1 hunks)agents/grpc/src/grpc_agent.cc(4 hunks)agents/grpc/src/grpc_agent.h(1 hunks)agents/grpc/src/grpc_errors.h(1 hunks)agents/grpc/src/proto/reconfigure.pb.cc(18 hunks)agents/grpc/src/proto/reconfigure.pb.h(4 hunks)agents/zmq/src/zmq_agent.cc(2 hunks)agents/zmq/src/zmq_agent.h(1 hunks)lib/nsolid.js(8 hunks)src/nsolid/nsolid_api.cc(1 hunks)test/agents/test-grpc-continuous-profile.mjs(2 hunks)test/agents/test-grpc-heap-profile.mjs(1 hunks)test/agents/test-grpc-heap-sampling.mjs(1 hunks)test/agents/test-grpc-profile.mjs(1 hunks)test/agents/test-grpc-snapshot.mjs(1 hunks)test/agents/test-grpc-tracing.mjs(3 hunks)test/agents/test-zmq-heap-profile.mjs(1 hunks)test/agents/test-zmq-heap-sampling.mjs(1 hunks)test/agents/test-zmq-profile.mjs(1 hunks)test/agents/test-zmq-snapshot.mjs(1 hunks)test/agents/test-zmq-tracing.mjs(2 hunks)test/common/nsolid-grpc-agent/client.js(2 hunks)test/common/nsolid-grpc-agent/index.js(1 hunks)test/common/nsolid-zmq-agent/client.js(1 hunks)test/common/nsolid-zmq-agent/index.js(1 hunks)test/parallel/test-nsolid-config-tracing.js(1 hunks)test/parallel/test-nsolid-heap-profile-stream.js(1 hunks)test/parallel/test-nsolid-heap-profile.js(1 hunks)test/parallel/test-nsolid-heap-sampling-stream.js(2 hunks)test/parallel/test-nsolid-heap-sampling.js(1 hunks)test/parallel/test-nsolid-profile.js(1 hunks)test/parallel/test-nsolid-snapshot.js(1 hunks)
🔇 Additional comments (32)
test/parallel/test-nsolid-heap-sampling-stream.js (1)
179-226: Verify synchronization guarantees for config changes.The test uses
setImmediate()to sequence operations after callingnsolid.start(),enableAssets(), anddisableAssets(). However, there's no explicit guarantee that config changes have fully propagated before the next assertion executes. This could lead to flaky test failures if the timing assumptions don't hold in all environments.Consider one of the following approaches:
- If nsolid provides synchronous config updates or completion callbacks, use those instead of
setImmediate().- Add explicit verification that the config state has changed before proceeding (e.g., check a status flag).
- Document the timing assumptions if
setImmediate()is sufficient based on the nsolid implementation.Run the following script to check if other nsolid tests use similar patterns:
agents/zmq/src/zmq_agent.h (1)
731-731: LGTM! Atomic flag added for thread-safe asset control.The
assets_enabled_flag is appropriately declared asstd::atomic<bool>for thread-safe access and is correctly placed in the profiling section alongside related state variables.agents/grpc/src/grpc_errors.h (1)
12-13: LGTM! New error type correctly defined.The
EAssetsDisablederror entry is properly added with:
- Sequential error code (1008)
- Appropriate HTTP status (500) for internal/configuration errors
- Clear error message that aligns with test expectations
agents/zmq/src/zmq_agent.cc (2)
1282-1287: LGTM! Configuration reading follows established pattern.The
assetsEnabledconfiguration is correctly read from the config and stored in the atomic flag, following the same pattern used for other configuration fields.
2115-2117: LGTM! Asset gating correctly implemented.The early return when
assets_enabled_is false properly prevents profiling operations when assets are disabled. TheUV_EINVALerror code maps to "Invalid arguments" (code 422) increate_command_error, which is appropriate for this validation failure.test/agents/test-grpc-tracing.mjs (3)
5-5: LGTM! Import added for async delays in test flow.The
delayimport fromnode:timers/promisesis appropriately added to support the new phase-based test flow that requires waiting for spans to flush between enable/disable cycles.
205-247: LGTM! Comprehensive phase-based test for HTTP tracing.The test correctly implements a multi-phase flow to verify:
- Initial span collection works
- Disabling traces prevents span collection (verified with delay and assertion)
- Re-enabling traces restores span collection
The phase state machine and early return on 'done' prevent race conditions and ensure proper test sequencing.
273-315: LGTM! Comprehensive phase-based test for custom tracing.The custom tracing test mirrors the HTTP tracing test structure, providing consistent coverage for the disable/enable functionality across different tracing types.
lib/nsolid.js (8)
133-136: LGTM! Public API surface extended with asset and trace controls.The four new functions (
enableAssets,disableAssets,enableTraces,disableTraces) are appropriately exported and provide runtime control over asset collection and tracing.
195-206: LGTM! Getters provide appropriate defaults.The new getters correctly:
- Default
assetsEnabledtotrue(assets enabled by default)- Default
tracingEnabledtofalse(tracing disabled by default)- Use nullish coalescing (
??) to handle undefined config values
413-415: LGTM! Assets guard correctly prevents heap profiling when disabled.The early check throws
ERR_NSOLID_HEAP_PROFILE_STARTwhen assets are disabled, preventing the operation before any profiling setup occurs.
497-498: LGTM! Assets guard correctly prevents heap sampling when disabled.The early check throws
ERR_NSOLID_HEAP_SAMPLING_STARTwhen assets are disabled, consistent with the heap profiling guard.
604-633: LGTM! Enable/disable functions correctly check state before updating.All four functions follow a consistent pattern:
- Check current state to avoid unnecessary updates
- Call
updateConfigwith the new valueThis prevents redundant configuration updates and ensures idempotent behavior.
761-765: LGTM! Config update handling uses normalization helper.The
assetsEnabledconfiguration is correctly normalized usingoptionToBoolbefore storing, which handles various input types (boolean, number, string) consistently.
947-956: LGTM! Config initialization with proper defaults.The
assetsEnabledfield is correctly initialized with:
- Priority order: environment variable > package.json > default (true)
- Proper normalization using
optionToBool- Sensible default of
truewhen neither source provides a value
1106-1116: LGTM! Flexible normalization helper for boolean configuration.The
optionToBoolfunction appropriately handles:
undefined/null→undefined(allows detecting absence of value)- Boolean → direct return
- Number → falsy check (0 → false, non-zero → true)
- String → delegates to
envToBoolfor "0"/"false" handling- Other types →
undefined(graceful handling)agents/grpc/proto/reconfigure.proto (1)
20-20: LGTM! Protocol buffer field correctly defined.The
assetsEnabledfield is properly added with:
- Sequential field number (13)
- Optional modifier for backward compatibility
- Appropriate boolean type
- Consistent camelCase naming
test/common/nsolid-zmq-agent/client.js (1)
141-148: LGTM! IPC message handlers correctly wire up runtime controls.The four new message handlers properly:
- Call the corresponding
nsolidfunctions to enable/disable assets and traces- Follow the fire-and-forget pattern (no response sent)
- Are consistent with the existing message handler structure
agents/grpc/src/grpc_agent.h (1)
352-352: LGTM! Atomic flag added for thread-safe asset control.The
assets_enabled_flag is appropriately declared asstd::atomic<bool>for thread-safe access and is correctly placed in the profiling section alongside related profiling state.test/parallel/test-nsolid-config-tracing.js (1)
24-29: LGTM: runtime tracing helpers keep config in syncThe toggles and corresponding assertions look correct and consistent with earlier start() checks.
test/agents/test-zmq-profile.mjs (1)
120-177: LGTM: ZMQ profile respects assetsEnabled and event flowGood coverage of disabled → 422 path, enabled happy-path, and final config verification. Resolve guard avoids double-resolve.
test/parallel/test-nsolid-heap-profile-stream.js (1)
68-70: LGTM: deterministic heap profile stream togglingNice sequencing around ‘end’ to avoid races; assertions match the API’s disabled vs enabled behavior.
Also applies to: 72-126
test/agents/test-zmq-heap-profile.mjs (1)
342-408: LGTM: ZMQ heap-profile honors assetsEnabled and validates payloadCovers error path, success path, and final config sync. Bootstrap opts prevent stray events. Looks good.
test/agents/test-zmq-snapshot.mjs (1)
235-296: LGTM: ZMQ snapshot gating via assetsEnabledError and success paths validated with proper event sequencing and final config check. All good.
test/parallel/test-nsolid-profile.js (1)
107-111: LGTM: assertion formattingNo behavior change; readability improved.
test/agents/test-zmq-heap-sampling.mjs (1)
135-201: LGTM! Comprehensive test for assetsEnabled toggling.The test properly validates:
- Disabling assets via config results in error code 422
- Re-enabling assets allows profiling to proceed
- Final config state is verified
The use of the
resolvedflag (lines 170-171) prevents duplicate event processing, which is a good defensive pattern.test/agents/test-grpc-heap-sampling.mjs (1)
248-378: LGTM! Comprehensive coverage of asset control methods.The three new tests thoroughly validate asset gating across different control surfaces:
- Environment variable (NSOLID_ASSETS_ENABLED)
- Configuration API (child.config())
- Helper methods (enableAssets/disableAssets)
Each test properly verifies:
- Error 500 with message "Assets collection disabled(1008)" when disabled
- Successful profiling when enabled
- Final config state consistency
The pattern is consistent and maintainable.
test/agents/test-grpc-snapshot.mjs (1)
246-367: LGTM! Consistent test coverage for snapshot operations.The tests mirror the heap sampling pattern and properly validate snapshot behavior with asset controls. The consistency across test files makes the test suite easier to understand and maintain.
test/agents/test-grpc-heap-profile.mjs (1)
259-389: LGTM! Complete test coverage for heap profiling with asset controls.The tests maintain consistency with the other asset control tests and properly validate heap profiling behavior across all three control surfaces.
test/common/nsolid-zmq-agent/index.js (1)
187-233: LGTM! Clean helper methods for runtime control.The four new methods (
enableTraces,disableTraces,enableAssets,disableAssets) follow the established async pattern in the TestClient class. The implementation is straightforward and consistent.test/common/nsolid-grpc-agent/index.js (1)
346-401: LGTM! Excellent refactoring with the #sendAndWait helper.The introduction of the
#sendAndWaithelper method (lines 389-401) is a good abstraction that:
- Reduces code duplication
- Provides a consistent pattern for message handling
- Makes the new helper methods (
enableAssets,disableAssets,enableTraces,disableTraces) more concise and maintainableThe refactored
config()method now uses this helper, demonstrating its utility.src/nsolid/nsolid_api.cc (1)
1161-1186: Verify continuous profiler is disabled when assetsEnabled is false.The logic correctly gates
contCpuProfileandcontCpuProfileIntervalreads behind theassetsEnabledcheck. However, ensure thatupdate_continuous_profilerproperly handles the case when both flags are false (which happens whenassetsEnabledis false).Run the following to verify the continuous profiler handling:
b3ccdf4 to
74e0d6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
test/parallel/test-nsolid-heap-sampling-stream.js (1)
179-187: Also assert the error code for gating failuresWhen assets are disabled, heapSamplingStream throws ERR_NSOLID_HEAP_SAMPLING_START synchronously. Assert the code property too for stronger checks.
assert.throws( () => { nsolid.heapSamplingStream(0, 1000); }, { - message: 'Heap sampling could not be started' + message: 'Heap sampling could not be started', + code: 'ERR_NSOLID_HEAP_SAMPLING_START' } );Apply the same change to Lines 204‑211.
Also applies to: 204-211
🧹 Nitpick comments (10)
test/parallel/test-nsolid-heap-sampling-stream.js (1)
173-177: Avoid starting ZMQ in this unit testNo need to pass command/data to nsolid.start() here. It can introduce unwanted ZMQ side-effects. Set only assetsEnabled.
- nsolid.start({ - command: 'localhost:9001', - data: 'localhost:9002', - assetsEnabled: false, - }); + nsolid.start({ assetsEnabled: false });lib/nsolid.js (3)
413-416: Validate args before gating; remove unused parameter
- Move the assetsEnabled check after validateNumber/validateBoolean to preserve argument error semantics consistently.
- The options parameter in heapProfileStream is unused; drop it to avoid confusion.
-function heapProfileStream(threadId, duration, trackAllocations, options) { - if (getConfig('assetsEnabled') === false) - throw new ERR_NSOLID_HEAP_PROFILE_START(); +function heapProfileStream(threadId, duration, trackAllocations) { + // Validate arguments first. validateNumber(threadId, 'threadId'); validateNumber(duration, 'duration'); validateBoolean(trackAllocations, 'trackAllocations'); + // Gate after validation. + if (getConfig('assetsEnabled') === false) + throw new ERR_NSOLID_HEAP_PROFILE_START();
947-957: Initialize assetsEnabled from env/pkg with sane defaultSolid defaulting logic. Be mindful that empty-string env values may not normalize to boolean (see optionToBool suggestion below).
Consider how NSOLID_ASSETS_ENABLED="" should behave. If it should be treated as “unset”, ensure optionToBool handles empty strings accordingly (see proposed change on Lines 1106‑1116).
1106-1116: optionToBool: handle empty strings explicitly and return strict booleansIf a string is empty, current envToBool returns a falsy value (empty string), not a strict boolean. When stored in config, later strict comparisons (=== false) can misbehave.
function optionToBool(value) { if (value === undefined || value === null) return undefined; if (typeof value === 'boolean') return value; if (typeof value === 'number') return value !== 0; - if (typeof value === 'string') - return envToBool(value); + if (typeof value === 'string') { + if (value === '') + return undefined; // treat empty string as “unset” + return value !== '0' && value.toLowerCase() !== 'false'; + } return undefined; }test/parallel/test-nsolid-profile.js (1)
120-182: Reduce flakiness: replace fixed 100ms delays with platform-aware timeoutsUse common.platformTimeout(100) for the nested setTimeout calls to better tolerate slow CI hosts.
Apply this diff within runAssetsToggleTests():
- }, 100); + }, common.platformTimeout(100)); ... - }, 100); + }, common.platformTimeout(100)); ... - }, 100); + }, common.platformTimeout(100)); ... - }, 100); + }, common.platformTimeout(100));test/agents/test-zmq-heap-profile.mjs (1)
342-409: Confirm error semantics across protocolsHere ZMQ returns 422 "Invalid arguments" when assets are disabled, while gRPC tests assert 500 "Assets collection disabled(1008)". Is this divergence intentional? If not, consider aligning error codes/messages for consistency across agents.
test/parallel/test-nsolid-heap-sampling.js (1)
177-248: Harden timeouts against CI varianceUse common.platformTimeout(100) instead of raw 100ms in nested setTimeout calls to reduce flakiness.
Suggested changes within runHeapSamplingAssetsToggleTests():
- }, 100); + }, common.platformTimeout(100)); - }, 100); + }, common.platformTimeout(100)); - }, 100); + }, common.platformTimeout(100)); - }, 100); + }, common.platformTimeout(100));test/parallel/test-nsolid-heap-profile.js (1)
133-205: Use platform-aware timeouts to avoid flakesReplace fixed 100ms in nested setTimeout calls with common.platformTimeout(100).
Apply within runHeapProfileAssetsToggleTests():
- }, 100); + }, common.platformTimeout(100)); - }, 100); + }, common.platformTimeout(100)); - }, 100); + }, common.platformTimeout(100)); - }, 100); + }, common.platformTimeout(100));test/agents/test-grpc-snapshot.mjs (1)
282-324: Reconfigure propagation timingAfter child.config({ assetsEnabled: false/true }), the test immediately calls heapSnapshot(). If reconfigure is eventually consistent, add a small wait or poll child.config() until it reflects the new value before proceeding to avoid rare races.
test/agents/test-grpc-continuous-profile.mjs (1)
275-432: Consider removing or documenting the commented-out code.This ~160-line commented-out section contains three test cases. If these tests are obsolete, remove them to improve code clarity. If they're deferred for future work, consider opening an issue to track them and adding a brief comment explaining why they're commented out.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
agents/grpc/proto/reconfigure.proto(1 hunks)agents/grpc/src/grpc_agent.cc(4 hunks)agents/grpc/src/grpc_agent.h(1 hunks)agents/grpc/src/grpc_errors.h(1 hunks)agents/grpc/src/proto/reconfigure.pb.cc(18 hunks)agents/grpc/src/proto/reconfigure.pb.h(4 hunks)agents/zmq/src/zmq_agent.cc(2 hunks)agents/zmq/src/zmq_agent.h(1 hunks)lib/nsolid.js(8 hunks)src/nsolid/nsolid_api.cc(1 hunks)test/agents/test-grpc-continuous-profile.mjs(2 hunks)test/agents/test-grpc-heap-profile.mjs(1 hunks)test/agents/test-grpc-heap-sampling.mjs(1 hunks)test/agents/test-grpc-profile.mjs(1 hunks)test/agents/test-grpc-snapshot.mjs(1 hunks)test/agents/test-grpc-tracing.mjs(3 hunks)test/agents/test-zmq-heap-profile.mjs(1 hunks)test/agents/test-zmq-heap-sampling.mjs(1 hunks)test/agents/test-zmq-profile.mjs(1 hunks)test/agents/test-zmq-snapshot.mjs(1 hunks)test/agents/test-zmq-tracing.mjs(2 hunks)test/common/nsolid-grpc-agent/client.js(2 hunks)test/common/nsolid-grpc-agent/index.js(1 hunks)test/common/nsolid-zmq-agent/client.js(1 hunks)test/common/nsolid-zmq-agent/index.js(1 hunks)test/parallel/test-nsolid-config-tracing.js(1 hunks)test/parallel/test-nsolid-heap-profile-stream.js(1 hunks)test/parallel/test-nsolid-heap-profile.js(1 hunks)test/parallel/test-nsolid-heap-sampling-stream.js(2 hunks)test/parallel/test-nsolid-heap-sampling.js(1 hunks)test/parallel/test-nsolid-profile.js(1 hunks)test/parallel/test-nsolid-snapshot.js(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (17)
- test/common/nsolid-zmq-agent/client.js
- agents/grpc/src/grpc_agent.cc
- test/agents/test-zmq-snapshot.mjs
- agents/grpc/src/grpc_agent.h
- test/parallel/test-nsolid-snapshot.js
- test/agents/test-zmq-tracing.mjs
- test/agents/test-grpc-profile.mjs
- agents/zmq/src/zmq_agent.h
- agents/zmq/src/zmq_agent.cc
- src/nsolid/nsolid_api.cc
- test/parallel/test-nsolid-config-tracing.js
- test/agents/test-grpc-heap-profile.mjs
- test/common/nsolid-grpc-agent/index.js
- agents/grpc/src/grpc_errors.h
- test/agents/test-zmq-heap-sampling.mjs
- test/parallel/test-nsolid-heap-profile-stream.js
- agents/grpc/src/proto/reconfigure.pb.h
🔇 Additional comments (21)
lib/nsolid.js (7)
133-137: Public toggles for assets/traces exposedGood addition. Clear, minimal API with enable/disable helpers.
195-207: Config getters for assetsEnabled/tracingEnabledGood defaults and exposure via nsolid.config and getters.
497-499: Gating for heapSamplingStreamLooks correct; early throw prevents stream creation when assets disabled.
761-765: assetsEnabled in updateConfigNormalization via optionToBool and conditional assignment is correct.
604-618: enableAssets/disableAssets helpersSimple, idempotent toggles are good.
620-634: enableTraces/disableTraces helpersConsistent with assets toggles.
637-638: Doc nitMinor doc change acknowledged. No issues.
agents/grpc/proto/reconfigure.proto (1)
20-20: Add assetsEnabled to ReconfigureBodyLooks good. Field number 13 follows existing numbering without conflict.
test/agents/test-zmq-profile.mjs (1)
120-177: ZMQ profile test for assetsEnabled toggleGood coverage: disables via config, asserts 422, re-enables and validates profile flow; also verifies config state.
test/common/nsolid-grpc-agent/client.js (2)
64-69: Guard otel.register against duplicate registrationGood defensive handling; ignores expected already-registered error.
124-137: Support runtime toggle messages for assets/tracesHandlers invoke nsolid toggles and send acks. Looks correct.
test/parallel/test-nsolid-profile.js (1)
107-112: LGTM on assertion formatting and flowMulti-line strictEqual improves readability, and invoking runAssetsToggleTests() extends coverage.
test/parallel/test-nsolid-heap-sampling.js (1)
164-168: ** LGTM on assertion formatting**Message split across lines is fine and consistent with other tests.
test/parallel/test-nsolid-heap-profile.js (1)
124-124: LGTMRunning runHeapProfileAssetsToggleTests() extends coverage for assets toggling.
test/agents/test-grpc-snapshot.mjs (2)
246-280: LGTM: env-based gating coverageGood validation of NSOLID_ASSETS_ENABLED=0 yielding 500 "Assets collection disabled(1008)".
325-368: LGTM: helper-based togglingdisableAssets()/enableAssets() paths look correct; asserting currentConfig.assetsEnabled ensures state consistency.
test/agents/test-grpc-heap-sampling.mjs (1)
248-378: LGTM! Comprehensive test coverage for assets gating.The three new test cases thoroughly validate the assetsEnabled feature across different configuration methods (environment variable, config API, and helper methods). The tests correctly verify that heap sampling returns a 500 error when assets are disabled and succeeds when re-enabled.
test/agents/test-grpc-tracing.mjs (1)
5-5: LGTM! Well-structured phase-based testing for tracing toggle.The phase-based state machine ('initial' → 'disabled' → 'reenabled' → 'done') effectively validates that tracing can be toggled at runtime. The tests correctly verify that no spans are emitted when tracing is disabled and that spans resume after re-enabling.
Also applies to: 205-247, 273-315
test/agents/test-grpc-continuous-profile.mjs (1)
4-4: LGTM! Thorough validation of continuous profiling with assets gating.The new and modified tests effectively verify that continuous CPU profiling respects the assetsEnabled flag across different configuration methods. The use of profile counters and timing delays provides clear validation of the gating behavior.
Also applies to: 100-224
test/common/nsolid-zmq-agent/index.js (1)
187-233: LGTM! Helper methods follow established patterns.The four new methods (enableTraces, disableTraces, enableAssets, disableAssets) correctly follow the existing IPC messaging pattern used throughout the TestClient class. The implementation is consistent with other helper methods.
agents/grpc/src/proto/reconfigure.pb.cc (1)
1-1082: Auto-generated protobuf code.This file contains auto-generated code from the protocol buffer compiler. The changes mechanically add the assetsEnabled field (field 13) to ReconfigureBody with appropriate initialization, serialization, and accessor logic. No manual review required.
74e0d6b to
de122d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
test/agents/test-zmq-heap-sampling.mjs (1)
135-201: LGTM! Completes consistent coverage across asset types.The test maintains the established pattern for heap sampling, with appropriate sampling-specific validations (lines 189-191). The consistent implementation across all three asset types (snapshot, heap profile, heap sampling) demonstrates thorough test coverage for the
assetsEnabledfeature.Note: The three tests share a nearly identical structure. While this duplication is acceptable and even beneficial for test clarity and isolation, consider extracting a reusable test helper if additional asset types are introduced in the future. For example:
async function testAssetsEnabledToggle(playground, startAssetFn, validateAssetFn) { // Common toggle logic }However, this refactor is entirely optional given the current scope.
test/parallel/test-nsolid-snapshot.js (3)
102-104: Confirm mustSucceed() without handler; otherwise pass a noop for safetyIf common.mustSucceed() requires a function, this could throw. Passing a noop keeps semantics identical and avoids coupling to helper internals.
- nsolid.snapshot(common.mustSucceed()); + nsolid.snapshot(common.mustSucceed(() => {}));
60-66: Re-invoking nsolid.start: verify idempotency and side-effectsStarting twice in the same process may reinitialize state/transport. If not guaranteed safe, prefer a single start then drive behavior via enableAssets()/disableAssets() and disableSnapshots flags, or reconfigure if supported.
68-76: Strengthen assertions: check the specific error code when assets are disabledThis PR introduces EAssetsDisabled. Asserting the code (in addition to message) avoids conflating “assets disabled” with other failure causes.
Example tweaks:
- assert.throws( + assert.throws( () => { nsolid.snapshot(); }, { - message: 'Heap snapshot could not be generated' + message: 'Heap snapshot could not be generated', + code: 'EAssetsDisabled' } );- assert.notStrictEqual(err.code, 0); - assert.strictEqual(err.message, 'Heap snapshot could not be generated'); + assert.strictEqual(err.code, 'EAssetsDisabled'); + assert.strictEqual(err.message, 'Heap snapshot could not be generated');Apply similarly to the later assert. If codes can vary by transport, consider allowing either the new code or falling back to the message match.
Also applies to: 78-81, 90-97
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
agents/grpc/proto/reconfigure.proto(1 hunks)agents/grpc/src/grpc_agent.cc(4 hunks)agents/grpc/src/grpc_agent.h(1 hunks)agents/grpc/src/grpc_errors.h(1 hunks)agents/grpc/src/proto/reconfigure.pb.cc(18 hunks)agents/grpc/src/proto/reconfigure.pb.h(4 hunks)agents/zmq/src/zmq_agent.cc(2 hunks)agents/zmq/src/zmq_agent.h(1 hunks)lib/nsolid.js(8 hunks)src/nsolid/nsolid_api.cc(1 hunks)test/agents/test-grpc-continuous-profile.mjs(2 hunks)test/agents/test-grpc-heap-profile.mjs(1 hunks)test/agents/test-grpc-heap-sampling.mjs(1 hunks)test/agents/test-grpc-profile.mjs(1 hunks)test/agents/test-grpc-snapshot.mjs(1 hunks)test/agents/test-grpc-tracing.mjs(3 hunks)test/agents/test-zmq-heap-profile.mjs(1 hunks)test/agents/test-zmq-heap-sampling.mjs(1 hunks)test/agents/test-zmq-profile.mjs(1 hunks)test/agents/test-zmq-snapshot.mjs(1 hunks)test/agents/test-zmq-tracing.mjs(2 hunks)test/common/nsolid-grpc-agent/client.js(2 hunks)test/common/nsolid-grpc-agent/index.js(1 hunks)test/common/nsolid-zmq-agent/client.js(1 hunks)test/common/nsolid-zmq-agent/index.js(1 hunks)test/parallel/test-nsolid-config-tracing.js(1 hunks)test/parallel/test-nsolid-heap-profile-stream.js(1 hunks)test/parallel/test-nsolid-heap-profile.js(1 hunks)test/parallel/test-nsolid-heap-sampling-stream.js(2 hunks)test/parallel/test-nsolid-heap-sampling.js(1 hunks)test/parallel/test-nsolid-profile.js(1 hunks)test/parallel/test-nsolid-snapshot.js(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (18)
- agents/zmq/src/zmq_agent.h
- test/agents/test-zmq-tracing.mjs
- test/agents/test-zmq-profile.mjs
- agents/grpc/src/grpc_agent.h
- agents/grpc/src/grpc_agent.cc
- test/agents/test-grpc-heap-profile.mjs
- test/parallel/test-nsolid-heap-profile-stream.js
- test/common/nsolid-grpc-agent/client.js
- agents/grpc/proto/reconfigure.proto
- test/common/nsolid-zmq-agent/client.js
- test/parallel/test-nsolid-profile.js
- agents/zmq/src/zmq_agent.cc
- src/nsolid/nsolid_api.cc
- test/parallel/test-nsolid-heap-sampling-stream.js
- agents/grpc/src/grpc_errors.h
- test/common/nsolid-zmq-agent/index.js
- test/agents/test-grpc-snapshot.mjs
- test/agents/test-grpc-profile.mjs
🔇 Additional comments (36)
test/parallel/test-nsolid-heap-profile.js (2)
124-124: LGTM!The placement and invocation of the new test function is appropriate, ensuring it runs after the existing test flow completes.
133-204: Verify enableAssets/disableAssets return values and consider timing robustness.The test provides good coverage of asset toggle scenarios. However, consider these improvements:
Missing verification: The calls to
enableAssets()anddisableAssets()(lines 166, 172, 193) don't verify success. If these operations can fail or return status, add assertions to confirm the state changes actually occurred.Timing robustness: The test relies on multiple hardcoded 100ms timeouts (lines 141, 167, 173, 194). In slow CI environments or under load, these may be insufficient and cause flakiness. Consider whether these delays can be replaced with event-driven synchronization or if they need to be more generous.
Callback nesting: The deeply nested structure (5 levels deep starting at line 169) reduces readability. While it follows the existing pattern in this file, consider whether this could be flattened in a future refactor.
To verify the reliability of the timing assumptions, you could run this test repeatedly under load:
test/agents/test-zmq-snapshot.mjs (1)
235-296: LGTM! Well-structured test for runtime asset toggling.The test correctly validates the
assetsEnabledruntime control:
- Properly disables assets and verifies the expected 422 error when attempting a snapshot
- Re-enables assets and confirms the snapshot succeeds
- Verifies the final configuration state
The defensive
resolvedflag (lines 242, 269-270) appropriately prevents duplicate event processing after test completion.test/agents/test-zmq-heap-profile.mjs (1)
342-408: LGTM! Consistent implementation for heap profiling.The test follows the same proven pattern as the snapshot test and includes appropriate heap-profile-specific validations:
- Lines 396-398 correctly verify that
trackAllocations: falseresults in empty trace data- Runtime toggling flow is properly implemented with correct error handling
- Final state verification confirms
assetsEnabledpersists after operation completionagents/grpc/src/proto/reconfigure.pb.cc (6)
49-50: LGTM! Generated field initialization is correct.The new
assetsenabled_field is properly initialized tofalsein the constexpr constructor, following the same pattern as other boolean fields likecontcpuprofile_.
107-133: LGTM! Has-bit layout correctly updated.The has-bit index offset (line 107) and field offsets (lines 120, 133) are properly updated to accommodate the new
assetsenabled_field. The hasbit index of 11 for the new field is consistent with it being the 12th optional field (0-indexed).
243-245: LGTM! Memory copy ranges correctly extended.The copy constructor (lines 243-245) and
SharedCtor(lines 263-265) memory ranges are properly updated to include the newassetsenabled_field as the last member in the contiguous memory block.Also applies to: 263-265
465-468: LGTM! Clear and serialization logic properly updated.The
Clearoperation (lines 465-468) now uses the correct has-bits mask0x00000f00U(covering bits 8-11), and the serialization code (lines 581-586) correctly emits the new field with tag 13 when present.Also applies to: 581-586
624-624: LGTM! Size calculation and merge logic correctly updated.Line 624: The has-bits mask in
ByteSizeLongis correctly updated from0x000005e0Uto0x00000de0U, adding bit 11 for the new field.Lines 702-714: The
MergeImplfunction properly handles merging the newassetsenabled_field when has-bit0x00000800Uis set.Also applies to: 702-714
739-740: LGTM! Swap logic correctly updated.The
InternalSwaprange (lines 739-740) now correctly usesassetsenabled_as the end marker since it's the last field in the contiguous memory block, replacing the previous end markercontcpuprofile_.agents/grpc/src/proto/reconfigure.pb.h (4)
233-233: LGTM! Field number and parse table correctly defined.Line 233: The new field number
kAssetsEnabledFieldNumber = 13is correctly defined in the enum.Line 408: The
TcParseTabletemplate parameter is properly updated from<4, 12, ...>to<4, 13, ...>to reflect the increased field count.Also applies to: 408-408
392-401: LGTM! Public accessor declarations are complete.Lines 392-401 declare the standard protobuf accessors for the
assetsEnabledfield:
- Public:
has_assetsenabled(),clear_assetsenabled(),assetsenabled(),set_assetsenabled()- Private:
_internal_assetsenabled(),_internal_set_assetsenabled()All follow the standard protobuf API patterns.
442-442: LGTM! Field added to internal struct.Line 442: The
assetsenabled_field is correctly added as the last boolean member in theImpl_struct, maintaining consistency with the implementation file.
1175-1201: LGTM! Accessor implementations are complete and correct.Lines 1175-1201 provide the inline implementations for all
assetsEnabledaccessors. The implementation:
- Uses the correct has-bit mask
0x00000800U(bit 11)- Includes proper TSan annotations for thread safety
- Follows the exact pattern of other boolean fields
The generated code is mechanically correct and consistent.
test/parallel/test-nsolid-config-tracing.js (1)
23-29: LGTM!The test properly validates that the new runtime tracing toggle helpers (
disableTraces()andenableTraces()) correctly update theconfig.tracingEnabledstate.test/agents/test-grpc-tracing.mjs (2)
205-247: Well-structured phase-based testing approach.The introduction of a phase state machine (
'initial'→'disabled'→'reenabled'→'done') properly orchestrates the multi-step tracing toggle test. The 200ms delays ensure spans have time to emit (or not emit when disabled), and the phase guards prevent race conditions after test completion.
273-315: LGTM!The custom tracing test follows the same solid phase-based pattern as the HTTP tracing test, ensuring consistent validation of the trace enable/disable functionality.
test/parallel/test-nsolid-heap-sampling.js (2)
164-169: LGTM!The multi-line assertion formatting improves readability, and the subsequent call to
runHeapSamplingAssetsToggleTests()properly integrates the new test flow.
177-248: Comprehensive asset toggle testing.The test function thoroughly validates heap sampling behavior when assets are toggled:
- Disables assets and verifies errors are thrown
- Re-enables assets and confirms sampling works
- Disables again and re-validates error conditions
The timing and sequencing properly ensure each operation completes before the next begins.
test/agents/test-grpc-heap-sampling.mjs (3)
248-285: LGTM!This test properly validates that heap sampling respects the
NSOLID_ASSETS_ENABLEDenvironment variable, returning the expected error when assets are disabled.
287-331: LGTM!The test correctly validates toggling
assetsEnabledviansolid.start()configuration, verifying both the disabled error state and the successful profile data collection when re-enabled.
333-378: LGTM!This test properly exercises the runtime helper methods (
disableAssets()andenableAssets()), confirming they gate heap sampling operations correctly and update the configuration state.test/agents/test-grpc-continuous-profile.mjs (4)
100-135: LGTM!The test properly validates that continuous profiling is suppressed when
NSOLID_ASSETS_ENABLED='0', waiting an appropriate duration to confirm no profiles are emitted.
137-180: LGTM!This test correctly validates that continuous profiling stops/resumes when
assetsEnabledis toggled via configuration, with appropriate delays to observe profile emission patterns.
182-224: LGTM!The test properly exercises the runtime helpers (
disableAssets()/enableAssets()) with continuous profiling, confirming that profile emission is gated correctly and configuration state is updated.
275-432: Verify the intent of the extensive commented code.There are 157 lines of commented-out test code. Please confirm whether:
- This code is intentionally disabled for future use
- These tests are superseded by the active tests
- The commented code should be removed
If this is intentional technical debt or work-in-progress, consider adding a comment explaining the rationale.
test/common/nsolid-grpc-agent/index.js (2)
346-387: Well-designed API extensions for runtime testing.The new TestClient methods provide a clean interface for testing asset and tracing toggles:
- Consistent early returns when
#childis null- Unified
#sendAndWaithelper eliminates code duplication- The
config()method's result mapping properly extracts the payload
389-401: LGTM!The private
#sendAndWaithelper properly centralizes the request/response pattern with:
- Configurable message type expectations
- Optional result mapping
- Proper listener cleanup after response
lib/nsolid.js (7)
133-136: LGTM!The new runtime toggle functions (
enableAssets,disableAssets,enableTraces,disableTraces) are properly exported on the public API surface.
195-206: LGTM!The new getters expose the current asset and tracing states with appropriate defaults (
assetsEnabled: true,tracingEnabled: false), using nullish coalescing to handle missing config values.
413-416: Proper runtime gating for asset collection.Both
heapProfileStreamandheapSamplingStreamcorrectly checkassetsEnabledand throw the appropriate error when disabled, preventing expensive profiling operations at runtime.Also applies to: 496-499
604-633: Well-implemented toggle functions.All four toggle functions (
enableAssets,disableAssets,enableTraces,disableTraces) follow a consistent pattern:
- Check current state to avoid unnecessary updates
- Call
updateConfigwith the new stateThis ensures idempotent behavior and proper config propagation.
761-765: LGTM!The
updateConfigfunction properly handles theassetsEnabledoption usingoptionToBoolfor normalization, ensuring only valid boolean values are assigned.
947-956: Proper configuration initialization for assetsEnabled.The initialization correctly prioritizes environment variables over package.json settings, with a sensible default of
truewhen neither is specified. This follows the established configuration precedence pattern.
1106-1116: Robust boolean normalization helper.The
optionToBoolfunction properly handles various input types:
- Returns
undefinedfornull/undefined(preserving absence)- Passes through booleans unchanged
- Converts numbers to boolean (0 = false, non-zero = true)
- Uses
envToBoolfor string conversionThis provides consistent boolean normalization across configuration sources.
test/parallel/test-nsolid-snapshot.js (1)
82-107: Good fix: toggles are now chained to snapshot completion, avoiding racesSequencing enable/disable after the respective snapshot callbacks resolves the prior “in‑flight snapshot” race. This should make the test deterministic.
de122d2 to
22ac97c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (8)
test/parallel/test-nsolid-profile.js (1)
120-182: Consider checking for asset-disabled-specific error codes or messages.The test comprehensively exercises the asset toggle functionality, but all error assertions verify the generic message "CPU profile could not be started" without distinguishing whether the failure is specifically due to disabled assets versus other causes. Since the PR introduces an
EAssetsDisablederror, consider adding more specific assertions to improve test precision and debugging.For example, check for a specific error code when assets are disabled:
assert.throws( () => { nsolid.profile(); }, { + code: 'ERR_NSOLID_ASSETS_DISABLED', // or whatever the actual code is message: 'CPU profile could not be started' } );This would make test failures easier to diagnose and catch regressions where the wrong error type is returned.
test/parallel/test-nsolid-heap-profile-stream.js (2)
94-96: Minor: Inconsistent stream consumption pattern.Lines 94-96 consume the stream by listening to 'data' events, while line 117 uses
.resume()to consume the stream without listening to data. This inconsistency makes the test harder to follow.Consider using a consistent pattern. If you don't need the profile data in the final test, use
.resume()consistently:- let profile = ''; const enabledStream = nsolid.heapProfileStream(0, 1200, true); - enabledStream.on('data', (chunk) => { - profile += chunk; - }); + enabledStream.resume(); enabledStream.on('end', common.mustCall(() => { - assert(JSON.parse(profile));Also applies to: 117-117
119-119: Minor: Weak assertion.
assert.ok(true)is a tautology that doesn't validate any meaningful outcome. The test should either verify that the stream completed successfully (whichcommon.mustCallon the 'end' handler already does) or be removed entirely.Remove the redundant assertion:
finalStream.on('end', common.mustCall(() => { - assert.ok(true); }));Alternatively, if you want to verify stream completion, the presence of
common.mustCallon the 'end' event handler is already sufficient validation.test/parallel/test-nsolid-heap-sampling-stream.js (3)
192-200: Consider adding allocation activity before heap sampling.The original test (lines 128-131) creates allocations to ensure heap sampling has data to sample. The re-enabled heap sampling tests might produce minimal or empty profiles without similar allocation activity.
If you want to ensure non-empty profiles in these tests, apply this diff:
nsolid.enableAssets(); setImmediate(() => { + const arr = []; + for (let i = 0; i < 1000; i++) { + arr.push(new Array(1000)); + } let profile = '';
221-221: Remove the meaningless assertion.The
assert.ok(true)assertion doesn't validate anything meaningful. Thecommon.mustCallwrapper already ensures the callback is invoked.Apply this diff to remove the redundant assertion:
- finalStream.on('end', common.mustCall(() => { - assert.ok(true); - })); + finalStream.on('end', common.mustCall());
180-188: Consider extracting the repeated error assertion pattern.The error assertion pattern is duplicated twice. Extracting it into a helper function would reduce duplication and improve maintainability.
Consider refactoring like this:
function assertHeapSamplingDisabled() { assert.throws( () => { nsolid.heapSamplingStream(0, 1000); }, { code: 'ERR_NSOLID_HEAP_SAMPLING_START', message: 'Heap sampling could not be started' } ); }Then replace both assertion blocks with calls to
assertHeapSamplingDisabled().Also applies to: 205-213
test/parallel/test-nsolid-heap-profile.js (2)
130-132: Tight 100 ms waits may flake; prefer longer or state-driven waitsCI can be slower; 100 ms can race with config propagation. Consider increasing timeouts (or polling for state) to reduce flakes.
Apply this minimal diff to relax timing:
- }, common.platformTimeout(100)); -}, common.platformTimeout(100)); + }, common.platformTimeout(250)); +}, common.platformTimeout(250));Optional: replace fixed sleeps with a small poll for the desired outcome to avoid magic delays:
async function waitForHeapProfileStartFailure() { // Poll until calling heapProfile() throws synchronously as expected. for (let i = 0; i < 25; i++) { try { nsolid.heapProfile(); } catch (e) { assert.strictEqual(e.message, 'Heap profile could not be started'); return; } await new Promise((r) => setTimeout(r, common.platformTimeout(20))); } assert.fail('heapProfile() did not start failing within the expected time'); }Please verify on slow builders whether 100 ms is sufficient or adopt one of the above.
133-204: Solid toggle tests; add callback end-error checks and (optionally) “disable while active” caseGreat coverage of disabled/enabled transitions. Two small gaps:
- Also assert the callback form of heapProfileEnd() fails while disabled (mirrors start path).
- Optionally, validate that disabling while an active profile is running still allows ending successfully (important runtime contract per PR).
Apply these small insertions to cover the callback end-error while disabled:
@@ assert.throws( () => { nsolid.heapProfileEnd(); }, { message: 'Heap profile could not be stopped' } ); + // Callback variant should also surface the disabled state + nsolid.heapProfileEnd(common.mustCall((err) => { + assert.notStrictEqual(err.code, 0); + assert.strictEqual(err.message, 'Heap profile could not be stopped'); + })); @@ assert.throws( () => { nsolid.heapProfileEnd(); }, { message: 'Heap profile could not be stopped' } ); + // Callback variant should also surface the disabled state + nsolid.heapProfileEnd(common.mustCall((err) => { + assert.notStrictEqual(err.code, 0); + assert.strictEqual(err.message, 'Heap profile could not be stopped'); + }));Optional addition to assert “disable while active” still allows end to succeed:
// Start a profile, disable assets while active, ensure end still succeeds. nsolid.enableAssets(); setTimeout(() => { nsolid.heapProfile(common.mustSucceed(() => { nsolid.disableAssets(); nsolid.heapProfileEnd(common.mustSucceed(() => { // Re-enable for any subsequent tests nsolid.enableAssets(); })); })); }, common.platformTimeout(100));If the implementation surfaces a specific disabled error code (e.g., EAssetsDisabled), consider asserting it in these disabled paths to strengthen the check. Based on PR objectives.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
agents/grpc/proto/reconfigure.proto(1 hunks)agents/grpc/src/grpc_agent.cc(4 hunks)agents/grpc/src/grpc_agent.h(1 hunks)agents/grpc/src/grpc_errors.h(1 hunks)agents/grpc/src/proto/reconfigure.pb.cc(18 hunks)agents/grpc/src/proto/reconfigure.pb.h(4 hunks)agents/zmq/src/zmq_agent.cc(2 hunks)agents/zmq/src/zmq_agent.h(1 hunks)lib/nsolid.js(8 hunks)src/nsolid/nsolid_api.cc(1 hunks)test/agents/test-grpc-continuous-profile.mjs(6 hunks)test/agents/test-grpc-heap-profile.mjs(1 hunks)test/agents/test-grpc-heap-sampling.mjs(1 hunks)test/agents/test-grpc-profile.mjs(1 hunks)test/agents/test-grpc-snapshot.mjs(1 hunks)test/agents/test-grpc-tracing.mjs(3 hunks)test/agents/test-zmq-heap-profile.mjs(1 hunks)test/agents/test-zmq-heap-sampling.mjs(1 hunks)test/agents/test-zmq-profile.mjs(1 hunks)test/agents/test-zmq-snapshot.mjs(1 hunks)test/agents/test-zmq-tracing.mjs(2 hunks)test/common/nsolid-grpc-agent/client.js(2 hunks)test/common/nsolid-grpc-agent/index.js(1 hunks)test/common/nsolid-zmq-agent/client.js(1 hunks)test/common/nsolid-zmq-agent/index.js(1 hunks)test/parallel/test-nsolid-config-tracing.js(1 hunks)test/parallel/test-nsolid-heap-profile-stream.js(1 hunks)test/parallel/test-nsolid-heap-profile.js(1 hunks)test/parallel/test-nsolid-heap-sampling-stream.js(2 hunks)test/parallel/test-nsolid-heap-sampling.js(1 hunks)test/parallel/test-nsolid-profile.js(1 hunks)test/parallel/test-nsolid-snapshot.js(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (19)
- agents/grpc/proto/reconfigure.proto
- test/agents/test-grpc-profile.mjs
- test/agents/test-zmq-profile.mjs
- agents/grpc/src/grpc_agent.cc
- test/parallel/test-nsolid-config-tracing.js
- test/common/nsolid-zmq-agent/client.js
- src/nsolid/nsolid_api.cc
- test/agents/test-zmq-tracing.mjs
- test/agents/test-zmq-heap-profile.mjs
- test/common/nsolid-zmq-agent/index.js
- test/agents/test-grpc-snapshot.mjs
- test/agents/test-zmq-heap-sampling.mjs
- test/agents/test-grpc-heap-sampling.mjs
- test/common/nsolid-grpc-agent/client.js
- test/agents/test-zmq-snapshot.mjs
- agents/grpc/src/grpc_agent.h
- lib/nsolid.js
- agents/zmq/src/zmq_agent.h
- agents/zmq/src/zmq_agent.cc
🔇 Additional comments (24)
test/parallel/test-nsolid-profile.js (2)
107-111: LGTM!The assertion formatting improves readability, and the integration of asset toggle testing at this point ensures all baseline profiling tests complete before exercising the new asset control surface.
117-118: LGTM!Wrapping timeouts with
common.platformTimeout()is a best practice that accounts for varying platform speeds and CI environments, reducing test flakiness.test/parallel/test-nsolid-snapshot.js (3)
56-57: LGTM! Proper test integration.The new test function is correctly integrated into the existing test flow, ensuring it runs after the previous assertions complete.
60-76: LGTM! Clear test setup and validation.The test correctly configures nsolid with assets disabled but snapshots enabled, then validates that snapshot operations fail with the appropriate error message when assets are disabled.
78-108: LGTM! Race condition properly addressed.The test correctly chains callbacks to ensure snapshots complete before toggling asset states, preventing the "Snapshot already in progress" race condition flagged in the previous review. The sequence properly:
- Waits for the error callback when assets are disabled (line 78)
- Enables assets only after the error is confirmed (line 83)
- Waits for the successful snapshot to complete before disabling assets (lines 86-88)
- Verifies the expected error after disabling (lines 90-97)
- Waits for the final snapshot to complete (line 103)
The use of
common.mustSucceed()ensures all expected callbacks are invoked, providing deterministic test behavior.test/parallel/test-nsolid-heap-profile-stream.js (1)
72-126: Major: Timing/synchronization concerns with nested setImmediate.The test relies on nested
setImmediatecallbacks to ensure config changes (nsolid.start(),enableAssets(),disableAssets()) take effect before attempting operations. If these configuration updates are truly asynchronous or involve internal state propagation, a singlesetImmediatetick might be insufficient, leading to race conditions and flaky test behavior.Verify whether the configuration methods are synchronous and whether
setImmediateprovides adequate synchronization:test/parallel/test-nsolid-heap-sampling.js (4)
164-167: LGTM: Formatting improvement for readability.The multi-line formatting of the error message assertion improves readability without changing the test logic.
174-175: Good: Consistent use of platform timeouts.Using
common.platformTimeout()ensures the test behaves correctly across different platform speeds and CI environments.
168-168: Good integration point for the new test.Calling
runHeapSamplingAssetsToggleTests()after the existing error path tests is a logical placement that extends the test coverage naturally.
177-248: Well-structured asset toggle test with timing considerations.The test comprehensively covers the asset enable/disable scenarios for heap sampling:
- Tests error paths when assets are disabled (both sync and async)
- Properly sequences operations to avoid mid-sampling disruption (line 216 only disables after sampling completes)
- Verifies successful sampling after re-enabling assets
However, the test relies heavily on
setTimeoutdelays (100ms) for configuration changes to propagate. Whilecommon.platformTimeout()helps, this could still be a source of flakiness on heavily loaded systems.Additionally, the test checks
err.code !== 0but doesn't verify the specific error code for the "assets disabled" scenario. According to the PR summary, anEAssetsDisablederror was added. Consider verifying the specific error code to make the test more precise and easier to debug if the wrong error is returned.Run the following script to check if there's a specific error code constant for assets disabled that should be verified:
test/parallel/test-nsolid-heap-sampling-stream.js (1)
138-138: LGTM!The integration point for the asset toggle tests is well-placed after validating the initial profile.
test/agents/test-grpc-tracing.mjs (2)
205-247: LGTM! Well-structured phase-based testing.The multi-phase flow (initial → disabled → reenabled → done) effectively validates that tracing can be toggled at runtime and that spans are properly suppressed when tracing is disabled.
273-315: LGTM! Consistent test coverage.The custom tracing test mirrors the http tracing test structure, ensuring comprehensive coverage of the enableTraces/disableTraces API across different span types.
test/agents/test-grpc-heap-profile.mjs (3)
259-296: LGTM! Proper error handling verification.The test correctly validates that heap profiling returns a 500 error with code 1008 when assets collection is disabled via the environment variable.
298-342: LGTM! Comprehensive config toggle validation.The test properly verifies that assetsEnabled can be toggled dynamically via nsolid.start() configuration and that the state changes affect profiling behavior as expected.
344-389: LGTM! Helper API coverage is complete.The test validates the enableAssets/disableAssets convenience methods and confirms the final state with a config check, ensuring the API surface is working correctly.
agents/grpc/src/grpc_errors.h (1)
12-13: LGTM! Properly defined error type.The new EAssetsDisabled error is correctly added to the GRPC_ERRORS macro with appropriate HTTP status code (500), message, and runtime code (1008) that aligns with the test expectations.
test/agents/test-grpc-continuous-profile.mjs (3)
121-156: LGTM! Proper verification of disabled continuous profiling.The test correctly validates that continuous CPU profiles are not emitted when assets collection is disabled via environment variable, using an appropriate wait period to confirm no emissions.
158-201: LGTM! Dynamic toggle behavior is well-tested.The test properly validates that continuous profiling can be stopped and restarted by toggling assetsEnabled via configuration, with appropriate tolerance for profiles in flight during state transitions.
203-245: LGTM! Helper methods are properly validated.The test confirms that the disableAssets/enableAssets convenience methods correctly control continuous profiling behavior and verifies the final configuration state.
test/common/nsolid-grpc-agent/index.js (1)
346-401: LGTM! Well-structured test client API.The new methods provide a clean interface for controlling assets and tracing at runtime. The centralized
#sendAndWaithelper promotes consistency and reduces duplication, with appropriate null checks for the child process.agents/grpc/src/proto/reconfigure.pb.cc (1)
1-1082: Auto-generated protobuf code.This file is automatically generated by the protocol buffer compiler from the proto definition. The changes correctly add support for the new
assetsEnabledfield (field 13) throughout the serialization, deserialization, copy, merge, and swap operations. The field is properly initialized tofalseand integrated into the has-bits mechanism.agents/grpc/src/proto/reconfigure.pb.h (1)
233-233: Auto-generated protobuf header.This file is automatically generated by the protocol buffer compiler. The changes correctly add the
assetsEnabledfield (field 13) with:
- Public field number enum constant
kAssetsEnabledFieldNumber = 13- Public accessors:
has_assetsenabled(),clear_assetsenabled(),assetsenabled(),set_assetsenabled()- Private internal helpers:
_internal_assetsenabled(),_internal_set_assetsenabled()- Private member variable
assetsenabled_in theImpl_structThe inline implementations follow standard protobuf patterns for optional boolean fields with proper has-bit management.
Also applies to: 392-401, 442-442, 1175-1201
test/parallel/test-nsolid-heap-profile.js (1)
124-124: Good placement for asset-toggle coverageRunning the toggle suite after the error-path chain is complete is sensible and avoids overlapping async state. LGTM.
Add comprehensive runtime control for asset collection (CPU/heap
profiling, snapshots) with multiple configuration methods.
Implements assetsEnabled flag with full control surface:
- NSOLID_ASSETS_ENABLED environment variable
- nsolid.start({ assetsEnabled: true/false }) configuration
- Runtime helpers: enableAssets() and disableAssets()
- Dynamic updates via gRPC reconfigure protocol
- Validation in both gRPC and ZMQ agents with proper errors
- Continuous profiler integration respects assetsEnabled state
This allows operators to disable expensive profiling operations
at runtime without process restarts, improving resource management
and production safety.
Also adds missing enableTraces()/disableTraces() helper methods
to complete the tracing API surface.
Comprehensive test coverage added for all asset types across gRPC,
ZMQ, and parallel test suites.
22ac97c to
8a2bb66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
agents/grpc/src/grpc_errors.h (1)
12-13: New EAssetsDisabled enum looks correct; consider status code semantics.Runtime code 1008 fits sequence; macro continuation is correct. Optionally consider HTTP-ish code 403 (forbidden) or 409 (conflict) instead of 500 to better signal operator-controlled disablement; align with clients/tests before changing.
Please confirm clients/tests map 1008 → “Assets collection disabled” and no other components expect 500 here.
agents/grpc/src/grpc_agent.cc (1)
1194-1199: Type‑safety when reading JSON; store atomically.Guard against non-boolean values and prepare for atomic storage (see race comment below).
Apply:
- { - auto it = config_.find("assetsEnabled"); - if (it != config_.end()) { - assets_enabled_ = *it; - } - } + { + auto it = config_.find("assetsEnabled"); + if (it != config_.end()) { + if (it->is_boolean()) { + assets_enabled_.store(*it, std::memory_order_relaxed); + } else { + Debug("Invalid assetsEnabled type; expected boolean\n"); + } + } + }Note: assets_enabled_ change assumes std::atomic (see next comment).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
agents/grpc/proto/reconfigure.proto(1 hunks)agents/grpc/src/grpc_agent.cc(5 hunks)agents/grpc/src/grpc_agent.h(1 hunks)agents/grpc/src/grpc_errors.h(1 hunks)agents/grpc/src/proto/reconfigure.pb.cc(18 hunks)agents/grpc/src/proto/reconfigure.pb.h(4 hunks)agents/zmq/src/zmq_agent.cc(2 hunks)agents/zmq/src/zmq_agent.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- agents/zmq/src/zmq_agent.h
- agents/grpc/src/grpc_agent.h
- agents/zmq/src/zmq_agent.cc
- agents/grpc/proto/reconfigure.proto
🧰 Additional context used
🧬 Code graph analysis (3)
agents/grpc/src/grpc_agent.cc (1)
agents/zmq/src/zmq_agent.cc (10)
config(430-476)config(430-430)config(1263-1378)config(1263-1263)config(2760-2795)config(2760-2760)config(2888-2905)config(2888-2888)config(2983-3000)config(2983-2983)
agents/grpc/src/proto/reconfigure.pb.h (2)
agents/grpc/src/proto/command.pb.h (5)
internal(44-47)ReconfigureBody(452-582)ReconfigureBody(837-843)agents/grpc/src/proto/reconfigure.pb.cc (4)
ReconfigureBody(53-53)ReconfigureBody(267-270)_impl_(59-60)_impl_(85-86)
agents/grpc/src/proto/reconfigure.pb.cc (1)
agents/grpc/src/proto/reconfigure.pb.h (16)
ReconfigureBody(94-94)ReconfigureBody(95-96)ReconfigureBody(681-693)ReconfigureBody(694-698)ReconfigureBody(699-721)ReconfigureBody(722-726)ReconfigureBody(727-749)ReconfigureBody(750-754)ReconfigureBody(755-777)ReconfigureBody(778-782)ReconfigureBody(783-805)ReconfigureBody(806-810)ReconfigureBody(811-834)ReconfigureBody(836-841)ReconfigureBody(875-887)ReconfigureBody(890-903)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: lint-js-and-md
- GitHub Check: test-linux (ubuntu-24.04-arm)
- GitHub Check: test-linux (ubuntu-24.04)
- GitHub Check: build-tarball
- GitHub Check: coverage-windows
- GitHub Check: test-macOS
🔇 Additional comments (5)
agents/grpc/src/grpc_agent.cc (3)
395-398: Good: propagate assetsEnabled in ReconfigureEvent.Only sets when present; matches ZMQ agent behavior.
Ensure reconfigure.proto field 13 is optional and absent when not set to avoid default false flips.
447-448: Defaulting assets to enabled is fine; confirm env/config precedence.assets_enabled_ defaults true. Please confirm any NSOLID_ASSETS_ENABLED env or nsolid.start config is applied before first asset command to avoid a short window with the wrong default.
1755-1757: Good: include assetsEnabled in reconfigure output.Keeps parity with ZMQ agent.
agents/grpc/src/proto/reconfigure.pb.h (1)
233-234: Generated field assetsEnabled = 13 is wired correctly.Enum, accessors, has-bits (0x00000800U), and parse table size updates look consistent with protoc 6.32.0.
Please confirm these files were re-generated from the updated reconfigure.proto (no manual edits) and that build scripts pin protoc/runtime (6.32.0) to avoid cross-version CI drift.
Also applies to: 392-402, 408-446, 1175-1202
agents/grpc/src/proto/reconfigure.pb.cc (1)
49-51: Proto gencode changes align with assetsEnabled field; no manual edits suggested.Init/clear/serialize/merge paths and descriptor/table deltas reflect field 13; has-bits masks updated (incl. 0x00000de0 popcount).
Ensure descriptor size/hash changes are committed wherever cached (if any), and that both pb.h/pb.cc are produced by the same protoc (6.32.0).
Also applies to: 107-121, 152-173, 338-429, 465-469, 581-587, 624-625, 702-715, 738-744
|
|
||
| if (!assets_enabled_) { | ||
| return ErrorType::EAssetsDisabled; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
*Critical: data race on assets_enabled_ (and on config_ for contCpuProfile) when called from _from_js.
start_*_from_js invokes do_start_prof_init on potentially non‑agent threads. Reading a non‑atomic bool and nlohmann::json without synchronization is undefined behavior.
Fix options (pick one):
- Make flags atomic and stop reading config_ off-thread.
- Or route the entire do_start_prof_init to the agent loop (enqueue request), returning immediately to JS.
Minimal atomic fix (recommended now), plus JSON type safety:
- if (!assets_enabled_) {
+ if (!assets_enabled_.load(std::memory_order_acquire)) {
return ErrorType::EAssetsDisabled;
}
- if (type == ProfileType::kCpu) {
- auto it = config_.find("contCpuProfile");
- if (it != config_.end() && it->get<bool>() == true) {
- return ErrorType::EInProgressError;
- }
- }
+ if (type == ProfileType::kCpu &&
+ cont_cpu_profile_enabled_.load(std::memory_order_acquire)) {
+ return ErrorType::EInProgressError;
+ }Header changes (agents/grpc/src/grpc_agent.h):
+#include <atomic>
...
- bool assets_enabled_;
+ std::atomic<bool> assets_enabled_{true};
+ std::atomic<bool> cont_cpu_profile_enabled_{false};Update config() to snapshot contCpuProfile to atomic:
+ {
+ auto it = config_.find("contCpuProfile");
+ if (it != config_.end()) {
+ if (it->is_boolean()) {
+ cont_cpu_profile_enabled_.store(*it, std::memory_order_relaxed);
+ } else {
+ Debug("Invalid contCpuProfile type; expected boolean\n");
+ cont_cpu_profile_enabled_.store(false, std::memory_order_relaxed);
+ }
+ } else {
+ cont_cpu_profile_enabled_.store(false, std::memory_order_relaxed);
+ }
+ }Optional: if assetsEnabled should also suppress continuous profiles, add a fast check at the start of got_continuous_profile to drop or close streams when disabled.
🏁 Script executed:
#!/bin/bash
# Locate *_from_js callsites and assess threading context.
rg -nP 'start_(cpu|heap)_(profile|sampling|snapshot)_from_js\s*\(' -C3 --type=cpp
# Grep for direct calls to do_start_prof_init outside agent loop callbacks.
rg -nP '\bdo_start_prof_init\s*\(' -C3 --type=cppLength of output: 17670
I need to inspect the actual implementation of do_start_prof_init and check the current state of the code:
Now let me inspect the actual implementation in grpc_agent.cc to see the current state:
Let me now examine the actual implementation in the grpc_agent.cc file around the line numbers indicated:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Inspect the actual do_start_prof_init implementation in grpc_agent.cc
sed -n '2092,2130p' agents/grpc/src/grpc_agent.ccLength of output: 1197
Now let me check the header file to see the current declarations:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Check the declaration of assets_enabled_ and related fields in grpc_agent.h
grep -n "assets_enabled_\|cont_cpu_profile\|config_" agents/grpc/src/grpc_agent.h | head -30Length of output: 279
Let me check the config() method and how config_ is synchronized:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Find the config() method and config_ update patterns
rg -n "config_\s*=" agents/grpc/src/grpc_agent.cc -B2 -A2 | head -60
# Also search for config() method
rg -n "void.*config\s*\(" agents/grpc/src/grpc_agent.cc -A5 | head -40Length of output: 182
Let me find what method contains the config assignment and check for synchronization:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Find the method containing config_ assignment at line 1019
sed -n '990,1030p' agents/grpc/src/grpc_agent.cc
# Check for any mutexes in the header
grep -n "mutex\|lock\|Lock" agents/grpc/src/grpc_agent.hLength of output: 1735
Let me verify one more detail about whether config() can be called concurrently with do_start_prof_init by checking the call sites:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Check where config() method is called from
rg -n "\.config\(" agents/grpc/src/grpc_agent.cc -B1 | head -20
# Check if config_msg_cb_ processes config updates asynchronously or on agent loop
sed -n '1250,1350p' agents/grpc/src/grpc_agent.cc | grep -A 30 "config_msg_cb_"Length of output: 0
Let me try a broader search for how config_msg_cb_ is used:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Search for config_msg_cb_ definition and usage
rg -n "config_msg_cb_" agents/grpc/src/grpc_agent.cc -A 15 | head -50Length of output: 1111
Critical data race confirmed: unprotected config_ access from concurrent threads.
The code modification at line 2097 (if (!assets_enabled_)) exposes a classic data race:
config_msg_cb_(agent event loop thread) writesconfig_unsafely at line 1019 viaconfig()methoddo_start_prof_init(arbitrary thread via *_from_js V8 callbacks) readsconfig_unsafely at lines 2110–2115- No synchronization between these threads
Additionally, assets_enabled_ is declared std::atomic<bool> but uses implicit operator! conversion instead of explicit .load(std::memory_order_acquire), bypassing memory ordering guarantees.
Fix as suggested in the review comment:
- Replace
if (!assets_enabled_)withif (!assets_enabled_.load(std::memory_order_acquire)) - Add
std::atomic<bool> cont_cpu_profile_enabled_{false}to header - Snapshot
contCpuProfilefromconfig_into the atomic in theconfig()method (with proper null/type checks) - Use
cont_cpu_profile_enabled_.load(std::memory_order_acquire)in the check at line 2110–2115
These changes eliminate the data race by preventing unprotected JSON reads from arbitrary threads.
🤖 Prompt for AI Agents
In agents/grpc/src/grpc_agent.cc around lines 2096–2100 (and the read at
2110–2115), the code performs unsynchronized reads of config_ and uses implicit
atomic conversion; replace the implicit check if (!assets_enabled_) with if
(!assets_enabled_.load(std::memory_order_acquire)), add a new std::atomic<bool>
cont_cpu_profile_enabled_{false} to the header, update the config() writer
(called at line ~1019) to safely snapshot contCpuProfile from config_ into
cont_cpu_profile_enabled_ after doing null/type checks, and change the
arbitrary-thread check at lines 2110–2115 to use
cont_cpu_profile_enabled_.load(std::memory_order_acquire) so there are no
unprotected JSON reads from non-event-loop threads.
The curl change fix a linking error on Windows while the change in grpc fixes a linking error on macOS.
Add comprehensive runtime control for asset collection (CPU/heap profiling, snapshots) with multiple configuration methods.
Implements assetsEnabled flag with full control surface:
This allows operators to disable expensive profiling operations at runtime without process restarts, improving resource management and production safety.
Also adds missing enableTraces()/disableTraces() helper methods to complete the tracing API surface.
Comprehensive test coverage added for all asset types across gRPC, ZMQ, and parallel test suites.
Summary by CodeRabbit
New Features
Behavior Changes
Tests