-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'ttriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Description
What is the problem?
Latest master.
This is a follow up to item 3) from #18003 cc @AmeerHajAli
When requesting resources using placement groups that are unfulfillable by the autoscaler, no error is raised. Additionally, nodes are started to fulfill part of the requested resources.
Reproduction (REQUIRED)
If a custom_resource is requested, but no node type can fulfill it, nodes are still started for the resources requested in the bundle.
Using this script:
import ray
ray.init(address="auto")
# If only the first bundle is passed, no nodes are started up
# Nodes are started up to fulfill the 2nd-9th bundle.
pgs = [
ray.util.placement_group([{"CPU": 4., "custom": 1.}] + [{"CPU": 1.}] * 8)
for i in range(4)
]
ray.get([pg.ready() for pg in pgs])
and using this cluster config:
cluster_name: ray-tune-custom-resource-test
max_workers: 20
upscaling_speed: 20
idle_timeout_minutes: 0
docker:
image: rayproject/ray:nightly
container_name: ray_container
pull_before_run: true
provider:
type: aws
region: us-west-2
availability_zone: us-west-2a
cache_stopped_nodes: false
available_node_types:
cpu_2_ondemand:
node_config:
InstanceType: m5.large
resources: {"CPU": 2}
min_workers: 0
max_workers: 10
cpu_8_ondemand:
node_config:
InstanceType: m5.2xlarge
resources: {"CPU": 8}
min_workers: 0
max_workers: 10
auth:
ssh_user: ubuntu
head_node_type: cpu_2_ondemand
worker_default_node_type: cpu_2_spot
file_mounts: {
"/test": "./"
}
Observed behavior:
- No error is thrown that this placement group will never be ready
- Nodes are started to fulfill the resource requests by the 2nd-9th bundle (1 CPU each)
Expected behavior:
- An error should be thrown that this request can never be satisfied
- The resources for the child bundles should not be started. The placement group can never be ready, so we shouldn't request any resources for any of the bundles at all.
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'ttriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)