What happened + What you expected to happen
Currently the datasets_shuffle_* nightly tests use 32-vCPU large machines and limit the object store memory to only a fraction of the available RAM. This means that we're probably underutilizing the machines and it's not very representative to a real setup either. Previous attempts to use smaller instance types / larger object store memory cause worker raylet OOMs, such as in datasets_shuffle_random_shuffle_1tb.
Here's some example output from dmesg on a worker raylet.
Versions / Dependencies
2.0dev
Reproduction script
datasets_shuffle_* nightly tests, and changing the instance type in the cluster config.
Issue Severity
No response