Ray 1.3.0 CHECK fail during normal task scheduling

### What is the problem?

While transitioning Modin to Ray version 1.3.0 we have several tests crash in Github Actions CI. Crashes could not be reproduced in development environment until I tried to create a VM with the same specs as Github Actions run: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners
Running such VM with 2 CPU cores and 7GB of RAM stably reproduces these crashes.

*Ray version and other system information (Python version, TensorFlow version, OS):*

Ray 1.3.0. Ubuntu 20.04.2 LTS.

### Reproduction (REQUIRED)
I created a Vagrantfile to easily create and provision a reproducer VM:
[Vagrantfile.gz](https://github.com/ray-project/ray/files/6525000/Vagrantfile.gz)

To use it follow these steps:
1. You need to have virtualization settings enabled in your BIOS https://bce.berkeley.edu/enabling-virtualization-in-your-pc-bios.html
2. Install vagrant from https://www.vagrantup.com/downloads
3. Install VM provider if not installed. By default vagrant uses VirtualBox.
3.1. Install VirtualBox from https://www.virtualbox.org/wiki/Linux_Downloads
3.2. Alternatively you can use KVM. I checked that both VMs produce the same result. To use KVM you need to install `vagrant-libvirt` plugin by `vagrant plugin install vagrant-libvirt`. It requires a bunch of dependencies which can be found here https://github.com/vagrant-libvirt/vagrant-libvirt. Also since my /var/lib filesystem is not large enough, I set up VMs to use an `images` pool which can be created and activated like this:
```
virsh pool-define-as images dir --target /localdisk/libvirt
virsh pool-start images
```
4. Add your user to `vboxusers` or `libvirt` group and make sure that setting is effective.
5. Run `vagrant up` in the same directory as Vagrantfile.
6. To get to VM run `vagrant ssh ubuntu2004-7gb`.
6. On VM activate conda environment and run tests command line.

This is a stack trace that I am getting in the crash:
```
Thread 37 "worker.io" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffd55fa700 (LWP 16109)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7c6d859 in __GI_abort () at abort.c:79
#2  0x00007fffdd32bb05 in ray::SpdLogMessage::Flush() ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#3  0x00007fffdd32bb3d in ray::RayLog::~RayLog() () from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#4  0x00007fffdcf48b6c in ray::CoreWorkerDirectTaskSubmitter::RequestNewWorkerIfNeeded(std::tuple<int, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> >, ray::ActorID> const&, ray::rpc::Address const*)::{lambda(ray::Status const&, ray::rpc::RequestWorkerLeaseReply const&)#1}::operator()(ray::Status const&, ray::rpc::RequestWorkerLeaseReply const&) const ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#5  0x00007fffdcf93dd5 in ray::rpc::ClientCallImpl<ray::rpc::RequestWorkerLeaseReply>::OnReplyReceived() ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#6  0x00007fffdce9dacb in std::_Function_handler<void (), ray::rpc::ClientCallManager::PollEventsFromCompletionQueue(int)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#7  0x00007fffdd2daa08 in boost::asio::detail::completion_handler<std::function<void ()> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#8  0x00007fffdd3e09a1 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#9  0x00007fffdd3e0ad1 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#10 0x00007fffdd3e25d0 in boost::asio::io_context::run() ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#11 0x00007fffdce9b895 in ray::CoreWorker::RunIOService() ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#12 0x00007fffdd685d10 in execute_native_thread_routine ()
   from /home/vagrant/miniconda3/envs/modin/lib/python3.8/site-packages/ray/_raylet.so
#13 0x00007ffff7fa8609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007ffff7d6a293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

- [x] I have verified my script runs in a clean environment and reproduces the issue.
- [ ] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ray 1.3.0 CHECK fail during normal task scheduling #15990

What is the problem?

Reproduction (REQUIRED)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ray 1.3.0 CHECK fail during normal task scheduling #15990

Description

What is the problem?

Reproduction (REQUIRED)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions