- 
                Notifications
    You must be signed in to change notification settings 
- Fork 139
Description
Describe the bug
Our bridge-to-rust layer extensively use a pattern where native Rust code is called with a callback function, which gets called asynchronously on completion of some Rust-side operation; on the TS side, we use the promisify function from the node:util built-in module to create a Promise that will resolve when the native code eventually invoke the provided callback function.
It appears that in some cases (possibly related to ESM and/or use of worker threads), Node may fail to recognize that such promise may still eventually resolve, and may therefore discard the promise as non-resolved. When that happens, it may cause sudden termination of the Node process (with exit code 13) or of some worker thread (for example when running tests).
I believe this is due to the fact that callback functions are not marked as "referenced" by the native code when they get passed to non-node threads (eg. when passing them to the Tokio runtime thread). Unfortunately, Neon doesn't currently expose the Thread Safe Functions API, which would have been the proper way to fix this issue.
Minimal Reproduction
⚠️ The reproduction code presented below explicitly calls theRuntime.shutdown()function. That function is not meant for public usage. Calling it explicitly should never be required, and the specific sequence demonstrated below is expected to actually fail by throwing an exception.This particular sequence was only kept because it is the simplest currently known sequence that predictably trigger the issue described above, ie. that execution of the code gets completely dropped in some situations, with the node process exiting with error 13, rather than properly waiting on the completion of the promise (which in this case, should result in an error being thrown).
⚠️ 
- 
Make sure package.jsoncontains the"type": "module"directive.
- 
Save the following code to a file named example.ts import { Runtime } from '@temporalio/worker'; import { TestWorkflowEnvironment } from '@temporalio/testing'; import { setTimeout } from 'node:timers/promises'; const main = async () => { const runtime = Runtime.install({}); console.log('Start test env start'); const testEnvironment = await TestWorkflowEnvironment.createLocal(); console.log('shutdown test env start'); await testEnvironment.teardown(); console.log('shutdown completed'); console.log('shutdown runtime - start'); try { await runtime.shutdown(); } catch (e) { console.error(e); } console.log('shutdown runtime - stop'); }; await main(); console.log('after main');
- 
Execute that code with tsx example.ts ; echo $?, and observe the following output:Start test env start shutdown test env start shutdown completed shutdown runtime - start 13
- 
Modify the code by replacing the await runtime.shutdown();line byawait Promise.all([runtime.shutdown(), setTimeout(2000)]);.
- 
Run again with the same command, and observe the following result: Start test env start shutdown test env start shutdown completed shutdown runtime - start [UnexpectedError: channel closed] shutdown runtime - stop after main 0
Note that the [UnexpectedError: channel closed] error is expected, as the runtime has already been shutdown after shutting down the test environment (because there was no more native resources being tracked by the runtime). This is not a problem by itself, but was simply kept as an easy way to demonstrate the present issue.
Running the same code in non-ESM context completes as expected.