[Serve] Calculate autoscaling decisions over whole scale_delay_s period

### Description

Currently, as per https://github.com/ray-project/ray/blob/f4e85381d22b40b45929dde4a5566960ca3f298d/python/ray/serve/autoscaling_policy.py#L85, scaling decisions are only made if they are consistent over the `upscale_delay_s` or `downscale_delay_s` however the final scaling decision is only based on the `desired_num_replicas` for that single call.

It would be nice to calculate the `desired_num_replicas` over the entire period. Even better would be selecting between min/max/average `desired_num_replicas` calculated over the delay period.

We can store the `desired_num_replicas` in the `policy_state` and caclulate the min/max/avg once the scaling decision is made.

### Use case

Currently, deployments with high variability in request counts have no clear mechanism to scale appropriately. For example, if `downscaling_delay_s=60` and there are 6 checks to the autoscaler, `desired_num_replicas=>[10,10,15,10,10,5]` then the cluster will scale to 5. Alternatively, if `desired_num_replicas=>[2,2,2,15,2,2]` there is a case to be made that the cluster should use 15 in order to accomodate the maximum traffic.


Note: 
Increasing `look_back_period_s` results in slowing down all scaling decisions, as well as changing the entire balance of ongoing_requests due to the smoothing effect.
`upscaling_factor` and `downscaling_factor` can be used to help, but in the example cases above they still completely miss the correct autoscaling decision.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve] Calculate autoscaling decisions over whole scale_delay_s period #46497

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Serve] Calculate autoscaling decisions over whole scale_delay_s period #46497

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions