Add new autoscaling parameter `aggregation function` #51905

Stack-Attack · 2025-04-02T07:12:05Z

Why are these changes needed?

Currently, the serve autoscaler makes scaling decisions only based on the most recent Serve Controller computation, even if the serve controller has made many scaling calculations over the scaling delay period. This results in poor autoscaling when clusters utilize long upscale/downscale delays. This PR allows more sensitive scaling by implementing min and max aggregate functions to handle metric collection in addition to the current average.

An alternate fix to this issue was proposed here..

Related issue number

#46497

Checks

[ ✓] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
[ ✓] I've run scripts/format.sh to lint the changes in this PR.
[✓ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ✓ ] I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
[ ✓] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [✓ ] Unit tests
- [✓ ] Release tests
- [] This PR is not tested :(

Stack-Attack · 2025-04-02T07:27:57Z

Some notes for @zcin.

I originally worked from your prototype, however, there were a few concerns which lead to this simpler implementation.

AFAIK there is only one serve controller. In large clusters with large deployments, aggregating all raw metrics in the controller seems inefficient.
In any case, the timestamps sent from the handle are not in sync with each other, so we would need to do some more complex windowing on the raw replica metrics if we wanted to align them. Otherwise, the controller sum over the look_back_period will be identical to sum of each individual max/min/avg which is used currently.
All this said, if this is the chosen method, I think this is a close enough approximation to the real total_max_requests given power-of-two routing should smooth things out.

However, given all this, I'm actually more confident in my previous implementation now than before. I'll add an approximate sketch below. It represents one theoretical scaling decision over some time.

Currently we only have the blue option. This PR gives us the red option. My previous PR gives us the target. I think the target is achievable with this PR and very careful smoothing parameters, but it doesn't make the average scaling case any better. The average aggregation result will still discard all previous data over the calculation window, with no way to use that information for better scaling.

Either way, I hope we can squeeze this in an upcoming release :).

Stack-Attack · 2025-04-02T07:30:36Z

python/ray/serve/_private/metrics_utils.py

We can abstract these methods into one with a function parameter, but I don't expect there will be more added.

Stack-Attack · 2025-04-02T07:33:25Z

python/ray/serve/_private/autoscaling_state.py

For some reason (legacy?) the replica count was based on replica-level metrics in the latter half of this block. Since this PR splits the aggregation function for replica and handle metrics, we need to modify this so that we always use handle metrics for autoscaling decisions.

The legacy autoscaling implementation was based off of request metrics collected from replicas, which is stored in _replica_requests. I think we want to keep that because if RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE is turned off, there won't be any data in _handle_requests.

Aha. Missed that case. Based on this, the feature would only be possible when RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=True. I'll re-add the fallback logic, add a warning message, and update the docs?

Stack-Attack · 2025-04-21T05:46:18Z

@zcin Any timeline on moving forward here? Either through this or the previous method?

zcin · 2025-04-22T20:29:08Z

@Stack-Attack I will take a look at this today!

zcin

Hi @Stack-Attack, I understand your concern, but I think the current implementation may not be calculating exactly what you want it to because the aggregation is happening locally. Let me know what you think.

zcin · 2025-04-22T23:29:54Z

python/ray/serve/_private/router.py

+                    raise ValueError(
+                        f"Unsupported aggregation function: "
+                        f"{self.autoscaling_config.aggregation_function}"
+                    )


If there are 10 handles, since we are aggregating "locally" then this means the autoscaling decision would be summing up:

max number of requests for handle 1 at T1

max number of requests for handle 2 at T2
...

max number of requests for handle 10 at T10

Is my understanding correct? So this is not calculating the maximum total number of request over the look back period, instead it's summing up a bunch of local maxes that occurred at different times?

@zcin That's correct.

AFAIK there is only one serve controller. In large clusters with large deployments, aggregating all raw metrics in the controller seems inefficient.

In any case, the timestamps sent from the handle are not in sync with each other, so we would need to do some more complex windowing on the raw replica metrics if we wanted to align them. Otherwise, the controller sum over the look_back_period will be identical to sum of each individual max/min/avg which is used [in this PR].

All this said, if this is the chosen method, I think this is a close enough approximation to the real total_max_requests given power-of-two routing should smooth things out.

However, given all this, I'm actually more confident in my previous implementation now than before. I'll add an approximate sketch below. It represents one theoretical scaling decision over some time.

instead it's summing up a bunch of local maxes that occurred at different times?

The different times is the key issue. I don't see an easy way to align timestamps in the controller (2), and even if we did the processing would not scale out horizontally (1), so the approximation using the latest calculation from each handle (this PR) seemed like the best balance (3).

If I missed something important let me know. I dug through the code, but you've got much deeper insight I'm sure.

@zcin Gentle bump. If we chose the approximation, then this should be good to go with little change. Want to confirm before finishing up.

zcin · 2025-04-22T23:31:07Z

python/ray/serve/_private/autoscaling_state.py

The legacy autoscaling implementation was based off of request metrics collected from replicas, which is stored in _replica_requests. I think we want to keep that because if RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE is turned off, there won't be any data in _handle_requests.

github-actions · 2025-06-08T00:36:33Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Stack-Attack · 2025-06-18T03:12:52Z

@zcin Bumping to remove stale label

github-actions · 2025-07-03T00:39:32Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

abrarsheikh · 2025-07-10T21:13:29Z

hey @Stack-Attack , can we connect over ray slack (Abrar Sheikh) and come to a conclusion here? I am motivated to get this feature in, but there are a few considerations which are better discussed in person for faster iteration. Let me know

Stack-Attack · 2025-07-23T02:49:58Z

@abrarsheikh Sure! Sorry for the delay; I was away for a few weeks. I'll get set up on the Ray slack and reach out this week. Looking forward to finally fixing this issue.

Stack-Attack · 2025-07-24T00:34:48Z

Spoke with @abrarsheikh today. He suggested we aggregate the raw metrics in controller, then align/window/reduce there. Since this does not scale out horizontally, we will need to add tests ensuring the controller can handle arbitrarily large clusters.

I plan to complete this next week!

Stack-Attack · 2025-08-12T11:57:50Z

Let me know if I need any additional tests/benchmarks.

zcin

Thanks @Stack-Attack for the contribution!

One request that I think will help us land your contribution faster: would you be able to split off your changes to metrics_utils and test_metrics_utils into a separate PR? That PR would be (1), and this PR would be (2) (and it would depend on PR#1). Since this change is quite big and touches a lot of components, this will help your reviewers focus on the main change per PR and speed up the review faster.

zcin · 2025-08-12T21:43:27Z

python/ray/serve/_private/controller.py

-        self, replica_id: str, window_avg: Optional[float], send_timestamp: float
+        self,
+        replica_id: str,
+        metrics_store: InMemoryMetricsStore,


It's a big strange we're passing an InMemoryMetricsStore object in between separate processes. Can we send the underlying dict of data instead of a InMemoryMetricsStore object?

This is the biggest change I will need clarity on. Of course, we can pass the data dict, but then we would have to move all aggregation and processing code into their own utility functions. I much prefer the idea of managing autoscaling metrics within their own class where we can ensure certain constraints rather than raw dictionaries. Especially as a loss of the constraints on these metrics (i.e pruning/sorting) could introduce nasty bugs in the future.

Please advise.

You can reconstruct the InMemoryMetricsStore object on the controller side. It seems like we don't allow that right now, you can probably add a class method that initializes InMemoryMetricsStore from a dictionary of data, something like def from_data(cls, data).

I'll do refactor for that, but can I ask the reasoning? Is pickling the class significantly slower than sending the dict and re-initializing?

I am actually not sure if there's overhead for pickling the class. I just think it's strange to send an "in memory store" object between distinct processes. In this case the InMemoryObjectStore object is just a wrapper that helps keep track of a bunch of utility functions anyways, it's not actually storing anything, so I think it's much cleaner to just send the data and reconstruct the wrapper.

Makes sense to me. I'll modify as suggested.

Modified as suggested. We could also make the metrics_utils aggregation into @staticmethod and use them on the raw data, as the object initialization now seems a little redundant.

zcin · 2025-08-12T21:53:29Z

python/ray/serve/_private/autoscaling_state.py

-                if handle_metric.total_requests > 0:
-                    logger.debug(
-                        f"Dropping metrics for handle '{handle_id}' because the Serve "
-                        f"actor it was on ({handle_metric.actor_id}) is no longer "
-                        f"alive. It had {handle_metric.total_requests} ongoing requests"
-                    )
+                logger.debug(
+                    f"Dropping metrics for handle '{handle_id}' because the Serve "
+                    f"actor it was on ({handle_metric.actor_id}) is no longer "
+                    f"alive."
+                )


why is this not gated by the if statement anymore? this can print a lot of logs that can confuse the user

Previously, each handle_metric stored it's total_requests in memory, since it was aggregated at the handle. The new method lazily computes the total_requests when needed. So this if statement would call a lot of compute for a simple debug log.

I would still like to keep it gated because the UX here is not very ideal -- previously this was not gated and it would make users think something is wrong, because it can happen pretty often for handles that already zeroed out their metrics. Is there a way we can keep it gated without adding too much overhead?

I'll think of an efficient way and re-Impliment with other suggestions!

Modified to approximate the totals using only the most recent values. Since this is only used for debug logging, it should be more than accurate enough.

zcin · 2025-08-12T21:53:44Z

python/ray/serve/_private/autoscaling_state.py

+                logger.info(
+                    f"Dropping stale metrics for handle '{handle_id}' {actor_info}"
+                    f"because no update was received for {timeout_s:.1f}s. "
+                )


python/ray/serve/_private/autoscaling_state.py

zcin · 2025-08-12T22:02:15Z

python/ray/serve/_private/autoscaling_state.py

-        for id in self._running_replicas:
-            if id in self._replica_requests:
-                total_requests += self._replica_requests[id].running_requests
+    def get_current_requests_per_replica(self) -> float:


What is this method used for?

The plan was to modify autoscaling to operate on a per_replica_aggregate value, however the current policy and testing structure require a total_running_replicas value. I decided to maintain this endpoint for visibility. As we move towards allowing custom autoscaling metrics and policies, I think it is worth maintaining.

Let's add it when we need it, it's easy for dead code to be introduced this way.

zcin · 2025-08-12T22:09:18Z

python/ray/serve/_private/autoscaling_state.py

+        if aggregate_function == "mean":
+            total_requests, report_count = merged_metrics.aggregate_avg(
+                self._running_replicas
+            ) or (0.0, 0)
+        elif aggregate_function == "max":
+            total_requests, report_count = merged_metrics.aggregate_max(
+                self._running_replicas
+            ) or (0.0, 0)
+        elif aggregate_function == "min":
+            total_requests, report_count = merged_metrics.aggregate_min(
+                self._running_replicas
+            ) or (0.0, 0)


nit: should we just have a single aggregate function that takes in aggregate_function? so we push down this logic to be metrics store's responsibility

Thought about this. Either way is fine with me. I personally just prefer not passing functions or arbitrary function names as strings.

But we are already passing aggregate_function as a string to this function? (and clarification: didn't mean to imply we should pass a function, I meant to pass the aggregate_function variable which is a string down to metrics store)
If you prefer to not work with aggregate_function == "mean" etc, how about changing it to enum?

Aggregation function is pulled from the 'autoscaling_config' directly since it's in-scope here. So it's a string but it's strictly defined by the config class. If you prefer enum or moving to metrics I can adjust either way if you feel one is optimal.

Gotcha. How about introducing an enum and having the config take Union[str, enum], similar to here, with a similar validator. Then when you're passing it down to metrics store, you can pass the enum.

Modified as suggested. Aggregate now lives in metrics_utils, as a simple wrapper with an enum.

python/ray/serve/_private/metrics_utils.py

zcin · 2025-08-12T22:28:14Z

python/ray/serve/tests/unit/test_deployment_state.py

+    print("TOTAL: ", deployment_state._replicas.count())
+    print("STATES", replicas)


zcin · 2025-08-12T22:39:23Z

python/ray/serve/_private/autoscaling_state.py

        to ensure enclusivity of the metrics.
        """
+        aggregated_requests, report_count = self._get_request_count()
+        return aggregated_requests * max(1, report_count)


What exactly is report_count? Is this the number of replicas for the deployment, the number of handles that hold request metrics for this deployment, the number of data points per handle, the number of data points per replica?

Without knowing this, it's unclear whether the calculation aggregated_requests * max(1, report_count) makes sense.

report_count is the number of replicas which have reported valid data to a valid handle_metric. This is not the same as running_replicas as there are cases where running_replicas have not yet reported metrics to the handle. So report_count ensures that the total value is calculated based only on the number of reports collected.

I'll make this more clear.

Is this true? In _aggregate_reduce, you define:

values = ( timeseries.value for key in keys for timeseries in self.data.get(key, ()) )

And then return the report_count as len(values). Doesn't that mean report_count is something along the lines of (#replicas) * (#data pts per replica)? Let me know if I'm understanding correctly.

@zcin Correct, it was Implimented incorrectly here, but the tests didn't catch it. I fixed this earlier today in the metrics PR so it will be resolved by that. As soon as that's approved I'll rebase this PR and resolve the metrics-related issues in this PR.

Stack-Attack

I've split the PR as suggested, with the first changes here. I'll hold here until that's merged to avoid an even messier history. @zcin

Stack-Attack · 2025-08-13T03:21:51Z

python/ray/serve/_private/autoscaling_state.py

-                if handle_metric.total_requests > 0:
-                    logger.debug(
-                        f"Dropping metrics for handle '{handle_id}' because the Serve "
-                        f"actor it was on ({handle_metric.actor_id}) is no longer "
-                        f"alive. It had {handle_metric.total_requests} ongoing requests"
-                    )
+                logger.debug(
+                    f"Dropping metrics for handle '{handle_id}' because the Serve "
+                    f"actor it was on ({handle_metric.actor_id}) is no longer "
+                    f"alive."
+                )


Previously, each handle_metric stored it's total_requests in memory, since it was aggregated at the handle. The new method lazily computes the total_requests when needed. So this if statement would call a lot of compute for a simple debug log.

Stack-Attack · 2025-08-13T03:25:46Z

python/ray/serve/_private/autoscaling_state.py

+        if aggregate_function == "mean":
+            total_requests, report_count = merged_metrics.aggregate_avg(
+                self._running_replicas
+            ) or (0.0, 0)
+        elif aggregate_function == "max":
+            total_requests, report_count = merged_metrics.aggregate_max(
+                self._running_replicas
+            ) or (0.0, 0)
+        elif aggregate_function == "min":
+            total_requests, report_count = merged_metrics.aggregate_min(
+                self._running_replicas
+            ) or (0.0, 0)


Thought about this. Either way is fine with me. I personally just prefer not passing functions or arbitrary function names as strings.

Stack-Attack · 2025-08-13T03:58:09Z

python/ray/serve/_private/autoscaling_state.py

-        for id in self._running_replicas:
-            if id in self._replica_requests:
-                total_requests += self._replica_requests[id].running_requests
+    def get_current_requests_per_replica(self) -> float:


The plan was to modify autoscaling to operate on a per_replica_aggregate value, however the current policy and testing structure require a total_running_replicas value. I decided to maintain this endpoint for visibility. As we move towards allowing custom autoscaling metrics and policies, I think it is worth maintaining.

Stack-Attack · 2025-08-13T04:06:14Z

python/ray/serve/_private/controller.py

-        self, replica_id: str, window_avg: Optional[float], send_timestamp: float
+        self,
+        replica_id: str,
+        metrics_store: InMemoryMetricsStore,


This is the biggest change I will need clarity on. Of course, we can pass the data dict, but then we would have to move all aggregation and processing code into their own utility functions. I much prefer the idea of managing autoscaling metrics within their own class where we can ensure certain constraints rather than raw dictionaries. Especially as a loss of the constraints on these metrics (i.e pruning/sorting) could introduce nasty bugs in the future.

Please advise.

python/ray/serve/_private/metrics_utils.py

Stack-Attack · 2025-08-13T04:31:03Z

python/ray/serve/_private/autoscaling_state.py

        to ensure enclusivity of the metrics.
        """
+        aggregated_requests, report_count = self._get_request_count()
+        return aggregated_requests * max(1, report_count)


report_count is the number of replicas which have reported valid data to a valid handle_metric. This is not the same as running_replicas as there are cases where running_replicas have not yet reported metrics to the handle. So report_count ensures that the total value is calculated based only on the number of reports collected.

I'll make this more clear.

python/ray/serve/_private/autoscaling_state.py

zcin · 2025-08-13T17:32:32Z

I've split the PR as suggested, with the first changes here. I'll hold here until that's merged to avoid an even messier history. @zcin

That sounds good!

…ler. (#55568) ## Why are these changes needed? These changes modify the autoscaler metrics collection and aggregation functions in preparation for global aggregation in the controller. ## Related issue number Partial for #46497 Required for #41135 #51905  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: abrar <[email protected]> Co-authored-by: Abrar Sheikh <[email protected]> Co-authored-by: Cindy Zhang <[email protected]> Co-authored-by: abrar <[email protected]>

Signed-off-by: Kyle <[email protected]>

Co-authored-by: Arthur Leung <[email protected]> Signed-off-by: Kyle Robinson <[email protected]>

…ity. Signed-off-by: Kyle Robinson <[email protected]>

Co-authored-by: Cindy Zhang <[email protected]> Signed-off-by: Kyle Robinson <[email protected]>

Signed-off-by: Kyle Robinson <[email protected]>

Stack-Attack · 2025-09-04T09:28:30Z

Reminder of the core changes:

Metrics are now sent to the controller as raw timeseries.
Timeseries are aggregated across all replicas in the controller.
Aggregation can make use of new functions max and min

What does not change here:

Autoscaling policy is unchanged, though the global aggregation is not mathematically identical.
Metrics are still collected in the Handle or replica, with queued requests coming from the handles regardless.
Metrics are still reported as key:[runningRequests] where key always represents either a replica_id or QUEUED_REQUESTS_KEY. In the future, we may want to record arbitrary metrics here per-replica.

@zcin @abrarsheikh I've rebased on master, and adapted the code from previous suggestions. Unit tests I've run have passed, and I'll watch the results of the full test suite.

Tomorrow I need to add more tests for the new functionality, as well as any changes you suggest.

Stack-Attack · 2025-09-04T09:30:50Z

python/ray/serve/_private/autoscaling_state.py

+                    replica_report.metrics_dict
+                    for replica_report in self._replica_requests.values()
+                ],
+                window_s=1,


This needs to be set to a variable. I would think it should be set to the same rate at which metrics are collected, not simply pushed. I didn't have time today to find it.

Signed-off-by: Kyle Robinson <[email protected]>

Stack-Attack · 2025-09-04T23:13:57Z

Looking into failing tests and running them locally.

Signed-off-by: Kyle Robinson <[email protected]>

Stack-Attack · 2025-09-05T07:24:32Z

If tests fail again, I'll split this PR one more time. Have one which just moves the aggregation to the controller, and then add aggregation_function here on top of that PR.

…ler. (ray-project#55568) ## Why are these changes needed? These changes modify the autoscaler metrics collection and aggregation functions in preparation for global aggregation in the controller. ## Related issue number Partial for ray-project#46497 Required for ray-project#41135 ray-project#51905  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: abrar <[email protected]> Co-authored-by: Abrar Sheikh <[email protected]> Co-authored-by: Cindy Zhang <[email protected]> Co-authored-by: abrar <[email protected]> Signed-off-by: sampan <[email protected]>

abrarsheikh · 2025-09-08T18:16:43Z

@Stack-Attack after looking at the details more closely, I think we need a gradual approach for this rollout. For that reason, I’ll take ownership of this task instead of moving forward with the current PR. I appreciate your effort here and hope you understand my reasoning.

…ler. (ray-project#55568) ## Why are these changes needed? These changes modify the autoscaler metrics collection and aggregation functions in preparation for global aggregation in the controller. ## Related issue number Partial for ray-project#46497 Required for ray-project#41135 ray-project#51905  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: abrar <[email protected]> Co-authored-by: Abrar Sheikh <[email protected]> Co-authored-by: Cindy Zhang <[email protected]> Co-authored-by: abrar <[email protected]> Signed-off-by: jugalshah291 <[email protected]>

…ler. (ray-project#55568) ## Why are these changes needed? These changes modify the autoscaler metrics collection and aggregation functions in preparation for global aggregation in the controller. ## Related issue number Partial for ray-project#46497 Required for ray-project#41135 ray-project#51905  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: abrar <[email protected]> Co-authored-by: Abrar Sheikh <[email protected]> Co-authored-by: Cindy Zhang <[email protected]> Co-authored-by: abrar <[email protected]> Signed-off-by: yenhong.wong <[email protected]>

…ler. (#55568) ## Why are these changes needed? These changes modify the autoscaler metrics collection and aggregation functions in preparation for global aggregation in the controller. ## Related issue number Partial for #46497 Required for #41135 #51905  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: Kyle Robinson <[email protected]> Signed-off-by: abrar <[email protected]> Co-authored-by: Abrar Sheikh <[email protected]> Co-authored-by: Cindy Zhang <[email protected]> Co-authored-by: abrar <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>

Stack-Attack requested review from a team, GeneDer, akshay-anyscale, edoakes and zcin as code owners April 2, 2025 07:12

Stack-Attack force-pushed the serve-aggregation-function branch from 6d5d722 to 4ff155f Compare April 2, 2025 07:42

Stack-Attack commented Apr 2, 2025

View reviewed changes

jcotant1 added the serve Ray Serve Related Issue label Apr 2, 2025

Stack-Attack force-pushed the serve-aggregation-function branch from 4ff155f to b198139 Compare April 3, 2025 07:01

hainesmichaelc added the community-contribution Contributed by the community label Apr 4, 2025

zcin reviewed Apr 22, 2025

View reviewed changes

Stack-Attack mentioned this pull request Apr 23, 2025

[Serve] Ray Serve Autoscaling supports the configuration of custom-metrics and policy #51632

Closed

Stack-Attack requested a review from zcin April 23, 2025 03:52

hainesmichaelc added community-backlog and removed community-backlog labels May 22, 2025

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 8, 2025

github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 19, 2025

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jul 3, 2025

akshay-anyscale requested a review from abrarsheikh July 3, 2025 17:24

github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Jul 4, 2025

Stack-Attack force-pushed the serve-aggregation-function branch 2 times, most recently from b496dc8 to 968a8c5 Compare August 12, 2025 11:53

Stack-Attack requested a review from arcyleung August 12, 2025 11:55

zcin reviewed Aug 12, 2025

View reviewed changes

Stack-Attack mentioned this pull request Aug 13, 2025

Update metrics_utils for future global metrics aggregation in controller. #55568

Merged

8 tasks

Stack-Attack commented Aug 13, 2025

View reviewed changes

zcin reviewed Aug 13, 2025

View reviewed changes

python/ray/serve/_private/autoscaling_state.py Outdated Show resolved Hide resolved

arcyleung mentioned this pull request Sep 3, 2025

[serve] Include custom metrics method and report to controller #56005

Merged

8 tasks

Stack-Attack and others added 6 commits September 4, 2025 11:15

Add new aggregation functions to serve autoscaler.

ad8ccba

Signed-off-by: Kyle <[email protected]>

Update python/ray/serve/_private/autoscaling_state.py

07c4c18

Co-authored-by: Arthur Leung <[email protected]> Signed-off-by: Kyle Robinson <[email protected]>

Fix metrics_utils and revert to old scaling policy for testing stabil…

08232e4

…ity. Signed-off-by: Kyle Robinson <[email protected]>

Update python/ray/serve/_private/autoscaling_state.py

9e79fdd

Co-authored-by: Cindy Zhang <[email protected]> Signed-off-by: Kyle Robinson <[email protected]>

Update python/ray/serve/_private/autoscaling_state.py

396bf91

Co-authored-by: Cindy Zhang <[email protected]> Signed-off-by: Kyle Robinson <[email protected]>

Rebase, and merge changes. Send raw dict values and add enums.

8da09f0

Signed-off-by: Kyle Robinson <[email protected]>

Stack-Attack force-pushed the serve-aggregation-function branch from b8faf5f to 8da09f0 Compare September 4, 2025 09:11

Stack-Attack commented Sep 4, 2025

View reviewed changes

Add stability handle.

cf01acd

Signed-off-by: Kyle Robinson <[email protected]>

Update config and doc index.md for AggregationFunction

14a997f

Signed-off-by: Kyle Robinson <[email protected]>

abrarsheikh closed this Sep 8, 2025

		print("TOTAL: ", deployment_state._replicas.count())
		print("STATES", replicas)

Add new autoscaling parameter aggregation function #51905

Add new autoscaling parameter aggregation function #51905

Uh oh!

Conversation

Stack-Attack commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Stack-Attack commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Stack-Attack commented Apr 21, 2025

Uh oh!

zcin commented Apr 22, 2025

Uh oh!

zcin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 8, 2025

Uh oh!

Stack-Attack commented Jun 18, 2025

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

abrarsheikh commented Jul 10, 2025

Uh oh!

Stack-Attack commented Jul 23, 2025

Uh oh!

Stack-Attack commented Jul 24, 2025

Uh oh!

Stack-Attack commented Aug 12, 2025

Uh oh!

zcin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zcin Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Add new autoscaling parameter `aggregation function` #51905

Add new autoscaling parameter `aggregation function` #51905

Stack-Attack commented Apr 2, 2025 •

edited

Loading

Stack-Attack commented Apr 2, 2025 •

edited

Loading

zcin left a comment •

edited

Loading

zcin Aug 13, 2025 •

edited

Loading

zcin Aug 13, 2025 •

edited

Loading

Stack-Attack Aug 13, 2025 •

edited

Loading