Skip to content

worker_threads: race and crash on worker exit #51129

@bnoordhuis

Description

@bnoordhuis

Reported with v20.9.0 but also present in ToT. Backtrace is from a debug build.

  * frame #0: 0x000055b5ae1539a2 node-ci`v8::base::OS::Abort() at platform-posix.cc:691:5
    frame #1: 0x000055b5ae13f7dc node-ci`V8_Fatal(file="", line=38, format="") at logging.cc:167:22
    frame #2: 0x000055b5ae13f7fb node-ci`v8::base::(anonymous namespace)::DefaultDcheckHandler(file=<unavailable>, line=<unavailable>, message=<unavailable>) at logging.cc:57:11
    frame #3: 0x000055b5ac38785d node-ci`v8::Isolate::GetCurrent() at isolate-inl.h:38:3
    frame #4: 0x00007f0aec58f2e4
    frame #5: 0x000055b5ac725d5a node-ci`v8::internal::GlobalHandles::InvokeFirstPassWeakCallbacks() at global-handles.cc:865:11
    frame #6: 0x000055b5ac725d27 node-ci`v8::internal::GlobalHandles::InvokeFirstPassWeakCallbacks(this=<unavailable>) at global-handles.cc:839:23
    frame #7: 0x000055b5ac840e40 node-ci`v8::internal::Heap::PerformGarbageCollection(this=0x00007f05c42d2048, collector=MARK_COMPACTOR, gc_reason=<unavailable>, collector_reason=<unavailable>) at heap.cc:2312:59
    frame #8: 0x000055b5ac841cb7 node-ci`v8::internal::Heap::CollectGarbage(this=0x00007f05c42d2048, space=<unavailable>, gc_reason=kExternalFinalize, gc_callback_flags=kGCCallbackScheduleIdleGarbageCollection) at heap.cc:1781:33
    frame #9: 0x000055b5ac842fa4 node-ci`v8::internal::Heap::FinalizeIncrementalMarkingAtomically(v8::internal::GarbageCollectionReason) [inlined] v8::internal::Heap::CollectAllGarbage(gc_callback_flags=<unavailable>, gc_reason=kExternalFinalize, flags=<unavailable>, this=0x00007f05c42d2048) at heap.cc:1477:17
    frame #10: 0x000055b5ac842f8e node-ci`v8::internal::Heap::FinalizeIncrementalMarkingAtomically(this=0x00007f05c42d2048, gc_reason=kExternalFinalize) at heap.cc:3802:20
    frame #11: 0x000055b5ac7652f9 node-ci`v8::internal::CppHeap::DetachIsolate(this=0x00007f05c40fd7b0) at cpp-heap.cc:560:59
    frame #12: 0x000055b5ac81c162 node-ci`v8::internal::Heap::DetachCppHeap(this=0x00007f05c42d2048) at heap.cc:5769:42
    frame #13: 0x000055b5abf29d0a node-ci`node::IsolateData::~IsolateData(this=0x00007f05c40ea5c0) at env.cc:583:28
    frame #14: 0x000055b5abf29d84 node-ci`node::IsolateData::~IsolateData(this=0x00007f05c40ea5c0) at env.cc:586:1
    frame #15: 0x000055b5abe6d144 node-ci`node::FreeIsolateData(isolate_data=0x00007f05c40ea5c0) at environment.cc:408:10
    frame #16: 0x000055b5abe67e10 node-ci`node::FunctionDeleter<node::IsolateData, &node::FreeIsolateData(node::IsolateData*)>::operator()(this=0x00007f0ae5bf9c50, pointer=0x00007f05c40ea5c0) const at util.h:675:47
    frame #17: 0x000055b5abe685c6 node-ci`std::__uniq_ptr_impl<node::IsolateData, node::FunctionDeleter<node::IsolateData, &node::FreeIsolateData(node::IsolateData*)>>::reset(this=0x00007f0ae5bf9c50, __p=0x0000000000000000) at unique_ptr.h:182:16
    frame #18: 0x000055b5abe66fe7 node-ci`std::unique_ptr<node::IsolateData, node::FunctionDeleter<node::IsolateData, &node::FreeIsolateData(node::IsolateData*)>>::reset(this=nullptr, __p=0x0000000000000000) at unique_ptr.h:456:12
    frame #19: 0x000055b5ac1c0d04 node-ci`node::worker::WorkerThreadData::~WorkerThreadData(this=0x00007f0ae5bf98f0) at node_worker.cc:220:26
    frame #20: 0x000055b5ac1b99aa node-ci`node::worker::Worker::Run(this=0x000055b5b2a7b480) at node_worker.cc:406:1
    frame #21: 0x000055b5ac1bba0a node-ci`operator(__closure=0x0000000000000000, arg=0x000055b5b2a7b480) at node_worker.cc:683:11
    frame #22: 0x000055b5ac1bbaae node-ci`_FUN((null)=0x000055b5b2a7b480) at node_worker.cc:693:3

The race condition is difficult to trigger but the problem is obvious once you see it. The order of events in src/node_worker.cc is as follows:

  1. Worker::Run() creates WorkerThreadData data(this)

  2. WorkerThreadData constructor creates the isolate and IsolateData

  3. Worker::Run() enters and exits said isolate (constructs and destructs a Locker and Isolate::Scope)

  4. Worker::Run() destructs the WorkerThreadData instance

  5. WorkerThreadData::~WorkerThreadData() calls IsolateData::~IsolateData()

  6. IsolateData::~IsolateData() calls Isolate::DetachCppHeap()

  7. Isolate::DetachCppHeap() runs GC

  8. GlobalHandles::InvokeFirstPassWeakCallbacks() calls Isolate::GetCurrent() but we've no longer entered the isolate (from step 3) and so it blows up violently

The solution IMO is to refactor so that the Locker stays around until the WorkerThreadData destructor finishes.

I believe the isolate can (and should) stay scoped to Worker::Run() and not leak out. Currently, WorkerThreadData "loans" the isolate to the Worker for no good reason I can see.

I can submit a patch but I'd like consensus on what approach to take before I start writing/refactoring/deleting code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    c++Issues and PRs that require attention from people who are familiar with C++.confirmed-bugIssues with confirmed bugs.workerIssues and PRs related to Worker support.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions