- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6.9k
 
Description
Description
Currently, as per
| def replica_queue_length_autoscaling_policy( | 
upscale_delay_s or downscale_delay_s however the final scaling decision is only based on the desired_num_replicas for that single call.
It would be nice to calculate the desired_num_replicas over the entire period. Even better would be selecting between min/max/average desired_num_replicas calculated over the delay period.
We can store the desired_num_replicas in the policy_state and caclulate the min/max/avg once the scaling decision is made.
Use case
Currently, deployments with high variability in request counts have no clear mechanism to scale appropriately. For example, if downscaling_delay_s=60 and there are 6 checks to the autoscaler, desired_num_replicas=>[10,10,15,10,10,5] then the cluster will scale to 5. Alternatively, if desired_num_replicas=>[2,2,2,15,2,2] there is a case to be made that the cluster should use 15 in order to accomodate the maximum traffic.
Note:
Increasing look_back_period_s results in slowing down all scaling decisions, as well as changing the entire balance of ongoing_requests due to the smoothing effect.
upscaling_factor and downscaling_factor can be used to help, but in the example cases above they still completely miss the correct autoscaling decision.