Skip to content

[placement groups/autoscaler] unfulfillable requests should raise an error #18018

@krfricke

Description

@krfricke

What is the problem?

Latest master.

This is a follow up to item 3) from #18003 cc @AmeerHajAli

When requesting resources using placement groups that are unfulfillable by the autoscaler, no error is raised. Additionally, nodes are started to fulfill part of the requested resources.

Reproduction (REQUIRED)

If a custom_resource is requested, but no node type can fulfill it, nodes are still started for the resources requested in the bundle.

Using this script:

import ray

ray.init(address="auto")


# If only the first bundle is passed, no nodes are started up
# Nodes are started up to fulfill the 2nd-9th bundle.
pgs = [
    ray.util.placement_group([{"CPU": 4., "custom": 1.}] + [{"CPU": 1.}] * 8)
    for i in range(4)
]

ray.get([pg.ready() for pg in pgs])

and using this cluster config:

cluster_name: ray-tune-custom-resource-test

max_workers: 20
upscaling_speed: 20

idle_timeout_minutes: 0

docker:
    image: rayproject/ray:nightly
    container_name: ray_container
    pull_before_run: true

provider:
    type: aws
    region: us-west-2
    availability_zone: us-west-2a
    cache_stopped_nodes: false

available_node_types:
    cpu_2_ondemand:
        node_config:
            InstanceType: m5.large
        resources: {"CPU": 2}
        min_workers: 0
        max_workers: 10
    cpu_8_ondemand:
        node_config:
            InstanceType: m5.2xlarge
        resources: {"CPU": 8}
        min_workers: 0
        max_workers: 10

auth:
    ssh_user: ubuntu

head_node_type: cpu_2_ondemand
worker_default_node_type: cpu_2_spot

file_mounts: {
  "/test": "./"
}

Observed behavior:

  1. No error is thrown that this placement group will never be ready
  2. Nodes are started to fulfill the resource requests by the 2nd-9th bundle (1 CPU each)

Expected behavior:

  1. An error should be thrown that this request can never be satisfied
  2. The resources for the child bundles should not be started. The placement group can never be ready, so we shouldn't request any resources for any of the bundles at all.

cc @DmitriGekhtman @sasha-s

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'ttriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions