Skip to content

Conversation

@Stack-Attack
Copy link
Contributor

@Stack-Attack Stack-Attack commented Apr 2, 2025

Why are these changes needed?

Currently, the serve autoscaler makes scaling decisions only based on the most recent Serve Controller computation, even if the serve controller has made many scaling calculations over the scaling delay period. This results in poor autoscaling when clusters utilize long upscale/downscale delays. This PR allows more sensitive scaling by implementing min and max aggregate functions to handle metric collection in addition to the current average.

An alternate fix to this issue was proposed here..

Related issue number

#46497

Checks

  • [ ✓] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ✓] I've run scripts/format.sh to lint the changes in this PR.
  • [✓ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ✓ ] I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • [ ✓] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [✓ ] Unit tests
    • [✓ ] Release tests
    • [] This PR is not tested :(

@Stack-Attack
Copy link
Contributor Author

Stack-Attack commented Apr 2, 2025

Some notes for @zcin.

I originally worked from your prototype, however, there were a few concerns which lead to this simpler implementation.

  1. AFAIK there is only one serve controller. In large clusters with large deployments, aggregating all raw metrics in the controller seems inefficient.
  2. In any case, the timestamps sent from the handle are not in sync with each other, so we would need to do some more complex windowing on the raw replica metrics if we wanted to align them. Otherwise, the controller sum over the look_back_period will be identical to sum of each individual max/min/avg which is used currently.
  3. All this said, if this is the chosen method, I think this is a close enough approximation to the real total_max_requests given power-of-two routing should smooth things out.

However, given all this, I'm actually more confident in my previous implementation now than before. I'll add an approximate sketch below. It represents one theoretical scaling decision over some time.

Currently we only have the blue option. This PR gives us the red option. My previous PR gives us the target. I think the target is achievable with this PR and very careful smoothing parameters, but it doesn't make the average scaling case any better. The average aggregation result will still discard all previous data over the calculation window, with no way to use that information for better scaling.
20250402_143244

Either way, I hope we can squeeze this in an upcoming release :).

@Stack-Attack Stack-Attack force-pushed the serve-aggregation-function branch from 6d5d722 to 4ff155f Compare April 2, 2025 07:42
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can abstract these methods into one with a function parameter, but I don't expect there will be more added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason (legacy?) the replica count was based on replica-level metrics in the latter half of this block. Since this PR splits the aggregation function for replica and handle metrics, we need to modify this so that we always use handle metrics for autoscaling decisions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy autoscaling implementation was based off of request metrics collected from replicas, which is stored in _replica_requests. I think we want to keep that because if RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE is turned off, there won't be any data in _handle_requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. Missed that case. Based on this, the feature would only be possible when RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=True. I'll re-add the fallback logic, add a warning message, and update the docs?

@jcotant1 jcotant1 added the serve Ray Serve Related Issue label Apr 2, 2025
@Stack-Attack Stack-Attack force-pushed the serve-aggregation-function branch from 4ff155f to b198139 Compare April 3, 2025 07:01
@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label Apr 4, 2025
@Stack-Attack
Copy link
Contributor Author

@zcin Any timeline on moving forward here? Either through this or the previous method?

@zcin
Copy link
Contributor

zcin commented Apr 22, 2025

@Stack-Attack I will take a look at this today!

Copy link
Contributor

@zcin zcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Stack-Attack, I understand your concern, but I think the current implementation may not be calculating exactly what you want it to because the aggregation is happening locally. Let me know what you think.

raise ValueError(
f"Unsupported aggregation function: "
f"{self.autoscaling_config.aggregation_function}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are 10 handles, since we are aggregating "locally" then this means the autoscaling decision would be summing up:

  • max number of requests for handle 1 at T1
  • max number of requests for handle 2 at T2
    ...
  • max number of requests for handle 10 at T10

Is my understanding correct? So this is not calculating the maximum total number of request over the look back period, instead it's summing up a bunch of local maxes that occurred at different times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zcin That's correct.

  1. AFAIK there is only one serve controller. In large clusters with large deployments, aggregating all raw metrics in the controller seems inefficient.
  2. In any case, the timestamps sent from the handle are not in sync with each other, so we would need to do some more complex windowing on the raw replica metrics if we wanted to align them. Otherwise, the controller sum over the look_back_period will be identical to sum of each individual max/min/avg which is used [in this PR].
  3. All this said, if this is the chosen method, I think this is a close enough approximation to the real total_max_requests given power-of-two routing should smooth things out.

However, given all this, I'm actually more confident in my previous implementation now than before. I'll add an approximate sketch below. It represents one theoretical scaling decision over some time.

instead it's summing up a bunch of local maxes that occurred at different times?

The different times is the key issue. I don't see an easy way to align timestamps in the controller (2), and even if we did the processing would not scale out horizontally (1), so the approximation using the latest calculation from each handle (this PR) seemed like the best balance (3).

If I missed something important let me know. I dug through the code, but you've got much deeper insight I'm sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zcin Gentle bump. If we chose the approximation, then this should be good to go with little change. Want to confirm before finishing up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy autoscaling implementation was based off of request metrics collected from replicas, which is stored in _replica_requests. I think we want to keep that because if RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE is turned off, there won't be any data in _handle_requests.

@github-actions
Copy link

github-actions bot commented Jun 8, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 8, 2025
@Stack-Attack
Copy link
Contributor Author

@zcin Bumping to remove stale label

@github-actions github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 19, 2025
@github-actions
Copy link

github-actions bot commented Jul 3, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jul 3, 2025
@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Jul 4, 2025
@abrarsheikh
Copy link
Contributor

hey @Stack-Attack , can we connect over ray slack (Abrar Sheikh) and come to a conclusion here? I am motivated to get this feature in, but there are a few considerations which are better discussed in person for faster iteration. Let me know

@Stack-Attack
Copy link
Contributor Author

@abrarsheikh Sure! Sorry for the delay; I was away for a few weeks. I'll get set up on the Ray slack and reach out this week. Looking forward to finally fixing this issue.

@Stack-Attack
Copy link
Contributor Author

Spoke with @abrarsheikh today. He suggested we aggregate the raw metrics in controller, then align/window/reduce there. Since this does not scale out horizontally, we will need to add tests ensuring the controller can handle arbitrarily large clusters.

I plan to complete this next week!

@Stack-Attack Stack-Attack force-pushed the serve-aggregation-function branch 2 times, most recently from b496dc8 to 968a8c5 Compare August 12, 2025 11:53
@Stack-Attack Stack-Attack requested a review from arcyleung August 12, 2025 11:55
@Stack-Attack
Copy link
Contributor Author

Let me know if I need any additional tests/benchmarks.

Copy link
Contributor

@zcin zcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Stack-Attack for the contribution!

One request that I think will help us land your contribution faster: would you be able to split off your changes to metrics_utils and test_metrics_utils into a separate PR? That PR would be (1), and this PR would be (2) (and it would depend on PR#1). Since this change is quite big and touches a lot of components, this will help your reviewers focus on the main change per PR and speed up the review faster.

self, replica_id: str, window_avg: Optional[float], send_timestamp: float
self,
replica_id: str,
metrics_store: InMemoryMetricsStore,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a big strange we're passing an InMemoryMetricsStore object in between separate processes. Can we send the underlying dict of data instead of a InMemoryMetricsStore object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the biggest change I will need clarity on. Of course, we can pass the data dict, but then we would have to move all aggregation and processing code into their own utility functions. I much prefer the idea of managing autoscaling metrics within their own class where we can ensure certain constraints rather than raw dictionaries. Especially as a loss of the constraints on these metrics (i.e pruning/sorting) could introduce nasty bugs in the future.

Please advise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reconstruct the InMemoryMetricsStore object on the controller side. It seems like we don't allow that right now, you can probably add a class method that initializes InMemoryMetricsStore from a dictionary of data, something like def from_data(cls, data).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do refactor for that, but can I ask the reasoning? Is pickling the class significantly slower than sending the dict and re-initializing?

Copy link
Contributor

@zcin zcin Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure if there's overhead for pickling the class. I just think it's strange to send an "in memory store" object between distinct processes. In this case the InMemoryObjectStore object is just a wrapper that helps keep track of a bunch of utility functions anyways, it's not actually storing anything, so I think it's much cleaner to just send the data and reconstruct the wrapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I'll modify as suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested. We could also make the metrics_utils aggregation into @staticmethod and use them on the raw data, as the object initialization now seems a little redundant.

Comment on lines 241 to 238
if handle_metric.total_requests > 0:
logger.debug(
f"Dropping metrics for handle '{handle_id}' because the Serve "
f"actor it was on ({handle_metric.actor_id}) is no longer "
f"alive. It had {handle_metric.total_requests} ongoing requests"
)
logger.debug(
f"Dropping metrics for handle '{handle_id}' because the Serve "
f"actor it was on ({handle_metric.actor_id}) is no longer "
f"alive."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not gated by the if statement anymore? this can print a lot of logs that can confuse the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, each handle_metric stored it's total_requests in memory, since it was aggregated at the handle. The new method lazily computes the total_requests when needed. So this if statement would call a lot of compute for a simple debug log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still like to keep it gated because the UX here is not very ideal -- previously this was not gated and it would make users think something is wrong, because it can happen pretty often for handles that already zeroed out their metrics. Is there a way we can keep it gated without adding too much overhead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think of an efficient way and re-Impliment with other suggestions!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to approximate the totals using only the most recent values. Since this is only used for debug logging, it should be more than accurate enough.

logger.info(
f"Dropping stale metrics for handle '{handle_id}' {actor_info}"
f"because no update was received for {timeout_s:.1f}s. "
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

for id in self._running_replicas:
if id in self._replica_requests:
total_requests += self._replica_requests[id].running_requests
def get_current_requests_per_replica(self) -> float:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this method used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan was to modify autoscaling to operate on a per_replica_aggregate value, however the current policy and testing structure require a total_running_replicas value. I decided to maintain this endpoint for visibility. As we move towards allowing custom autoscaling metrics and policies, I think it is worth maintaining.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add it when we need it, it's easy for dead code to be introduced this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines 311 to 322
if aggregate_function == "mean":
total_requests, report_count = merged_metrics.aggregate_avg(
self._running_replicas
) or (0.0, 0)
elif aggregate_function == "max":
total_requests, report_count = merged_metrics.aggregate_max(
self._running_replicas
) or (0.0, 0)
elif aggregate_function == "min":
total_requests, report_count = merged_metrics.aggregate_min(
self._running_replicas
) or (0.0, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we just have a single aggregate function that takes in aggregate_function? so we push down this logic to be metrics store's responsibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this. Either way is fine with me. I personally just prefer not passing functions or arbitrary function names as strings.

Copy link
Contributor

@zcin zcin Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we are already passing aggregate_function as a string to this function? (and clarification: didn't mean to imply we should pass a function, I meant to pass the aggregate_function variable which is a string down to metrics store)
If you prefer to not work with aggregate_function == "mean" etc, how about changing it to enum?

Copy link
Contributor Author

@Stack-Attack Stack-Attack Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggregation function is pulled from the 'autoscaling_config' directly since it's in-scope here. So it's a string but it's strictly defined by the config class. If you prefer enum or moving to metrics I can adjust either way if you feel one is optimal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. How about introducing an enum and having the config take Union[str, enum], similar to here, with a similar validator. Then when you're passing it down to metrics store, you can pass the enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested. Aggregate now lives in metrics_utils, as a simple wrapper with an enum.

Comment on lines 629 to 630
print("TOTAL: ", deployment_state._replicas.count())
print("STATES", replicas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

to ensure enclusivity of the metrics.
"""
aggregated_requests, report_count = self._get_request_count()
return aggregated_requests * max(1, report_count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is report_count? Is this the number of replicas for the deployment, the number of handles that hold request metrics for this deployment, the number of data points per handle, the number of data points per replica?

Without knowing this, it's unclear whether the calculation aggregated_requests * max(1, report_count) makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report_count is the number of replicas which have reported valid data to a valid handle_metric. This is not the same as running_replicas as there are cases where running_replicas have not yet reported metrics to the handle. So report_count ensures that the total value is calculated based only on the number of reports collected.

I'll make this more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? In _aggregate_reduce, you define:

        values = (
            timeseries.value for key in keys for timeseries in self.data.get(key, ())
        )

And then return the report_count as len(values). Doesn't that mean report_count is something along the lines of (#replicas) * (#data pts per replica)? Let me know if I'm understanding correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zcin Correct, it was Implimented incorrectly here, but the tests didn't catch it. I fixed this earlier today in the metrics PR so it will be resolved by that. As soon as that's approved I'll rebase this PR and resolve the metrics-related issues in this PR.

Copy link
Contributor Author

@Stack-Attack Stack-Attack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split the PR as suggested, with the first changes here. I'll hold here until that's merged to avoid an even messier history. @zcin

Comment on lines 241 to 238
if handle_metric.total_requests > 0:
logger.debug(
f"Dropping metrics for handle '{handle_id}' because the Serve "
f"actor it was on ({handle_metric.actor_id}) is no longer "
f"alive. It had {handle_metric.total_requests} ongoing requests"
)
logger.debug(
f"Dropping metrics for handle '{handle_id}' because the Serve "
f"actor it was on ({handle_metric.actor_id}) is no longer "
f"alive."
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, each handle_metric stored it's total_requests in memory, since it was aggregated at the handle. The new method lazily computes the total_requests when needed. So this if statement would call a lot of compute for a simple debug log.

Comment on lines 311 to 322
if aggregate_function == "mean":
total_requests, report_count = merged_metrics.aggregate_avg(
self._running_replicas
) or (0.0, 0)
elif aggregate_function == "max":
total_requests, report_count = merged_metrics.aggregate_max(
self._running_replicas
) or (0.0, 0)
elif aggregate_function == "min":
total_requests, report_count = merged_metrics.aggregate_min(
self._running_replicas
) or (0.0, 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this. Either way is fine with me. I personally just prefer not passing functions or arbitrary function names as strings.

for id in self._running_replicas:
if id in self._replica_requests:
total_requests += self._replica_requests[id].running_requests
def get_current_requests_per_replica(self) -> float:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan was to modify autoscaling to operate on a per_replica_aggregate value, however the current policy and testing structure require a total_running_replicas value. I decided to maintain this endpoint for visibility. As we move towards allowing custom autoscaling metrics and policies, I think it is worth maintaining.

self, replica_id: str, window_avg: Optional[float], send_timestamp: float
self,
replica_id: str,
metrics_store: InMemoryMetricsStore,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the biggest change I will need clarity on. Of course, we can pass the data dict, but then we would have to move all aggregation and processing code into their own utility functions. I much prefer the idea of managing autoscaling metrics within their own class where we can ensure certain constraints rather than raw dictionaries. Especially as a loss of the constraints on these metrics (i.e pruning/sorting) could introduce nasty bugs in the future.

Please advise.

to ensure enclusivity of the metrics.
"""
aggregated_requests, report_count = self._get_request_count()
return aggregated_requests * max(1, report_count)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report_count is the number of replicas which have reported valid data to a valid handle_metric. This is not the same as running_replicas as there are cases where running_replicas have not yet reported metrics to the handle. So report_count ensures that the total value is calculated based only on the number of reports collected.

I'll make this more clear.

@zcin
Copy link
Contributor

zcin commented Aug 13, 2025

I've split the PR as suggested, with the first changes here. I'll hold here until that's merged to avoid an even messier history. @zcin

That sounds good!

zcin added a commit that referenced this pull request Sep 4, 2025
…ler. (#55568)

## Why are these changes needed?

These changes modify the autoscaler metrics collection and aggregation
functions in preparation for global aggregation in the controller.

## Related issue number
Partial for #46497 

Required for #41135 #51905 

<!-- For example: "Closes #1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: abrar <[email protected]>
Co-authored-by: Abrar Sheikh <[email protected]>
Co-authored-by: Cindy Zhang <[email protected]>
Co-authored-by: abrar <[email protected]>
@Stack-Attack Stack-Attack force-pushed the serve-aggregation-function branch from b8faf5f to 8da09f0 Compare September 4, 2025 09:11
@Stack-Attack
Copy link
Contributor Author

Reminder of the core changes:

  1. Metrics are now sent to the controller as raw timeseries.
  2. Timeseries are aggregated across all replicas in the controller.
  3. Aggregation can make use of new functions max and min

What does not change here:

  1. Autoscaling policy is unchanged, though the global aggregation is not mathematically identical.
  2. Metrics are still collected in the Handle or replica, with queued requests coming from the handles regardless.
  3. Metrics are still reported as key:[runningRequests] where key always represents either a replica_id or QUEUED_REQUESTS_KEY. In the future, we may want to record arbitrary metrics here per-replica.

@zcin @abrarsheikh I've rebased on master, and adapted the code from previous suggestions. Unit tests I've run have passed, and I'll watch the results of the full test suite.

Tomorrow I need to add more tests for the new functionality, as well as any changes you suggest.

replica_report.metrics_dict
for replica_report in self._replica_requests.values()
],
window_s=1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be set to a variable. I would think it should be set to the same rate at which metrics are collected, not simply pushed. I didn't have time today to find it.

Signed-off-by: Kyle Robinson <[email protected]>
@Stack-Attack
Copy link
Contributor Author

Looking into failing tests and running them locally.

@Stack-Attack
Copy link
Contributor Author

If tests fail again, I'll split this PR one more time. Have one which just moves the aggregation to the controller, and then add aggregation_function here on top of that PR.

sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025
…ler. (ray-project#55568)

## Why are these changes needed?

These changes modify the autoscaler metrics collection and aggregation
functions in preparation for global aggregation in the controller.

## Related issue number
Partial for ray-project#46497

Required for ray-project#41135 ray-project#51905

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: abrar <[email protected]>
Co-authored-by: Abrar Sheikh <[email protected]>
Co-authored-by: Cindy Zhang <[email protected]>
Co-authored-by: abrar <[email protected]>
Signed-off-by: sampan <[email protected]>
@abrarsheikh
Copy link
Contributor

@Stack-Attack after looking at the details more closely, I think we need a gradual approach for this rollout. For that reason, I’ll take ownership of this task instead of moving forward with the current PR. I appreciate your effort here and hope you understand my reasoning.

@abrarsheikh abrarsheikh closed this Sep 8, 2025
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
…ler. (ray-project#55568)

## Why are these changes needed?

These changes modify the autoscaler metrics collection and aggregation
functions in preparation for global aggregation in the controller.

## Related issue number
Partial for ray-project#46497

Required for ray-project#41135 ray-project#51905

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: abrar <[email protected]>
Co-authored-by: Abrar Sheikh <[email protected]>
Co-authored-by: Cindy Zhang <[email protected]>
Co-authored-by: abrar <[email protected]>
Signed-off-by: jugalshah291 <[email protected]>
wyhong3103 pushed a commit to wyhong3103/ray that referenced this pull request Sep 12, 2025
…ler. (ray-project#55568)

## Why are these changes needed?

These changes modify the autoscaler metrics collection and aggregation
functions in preparation for global aggregation in the controller.

## Related issue number
Partial for ray-project#46497

Required for ray-project#41135 ray-project#51905

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: abrar <[email protected]>
Co-authored-by: Abrar Sheikh <[email protected]>
Co-authored-by: Cindy Zhang <[email protected]>
Co-authored-by: abrar <[email protected]>
Signed-off-by: yenhong.wong <[email protected]>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
…ler. (#55568)

## Why are these changes needed?

These changes modify the autoscaler metrics collection and aggregation
functions in preparation for global aggregation in the controller.

## Related issue number
Partial for #46497

Required for #41135 #51905

<!-- For example: "Closes #1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: Kyle Robinson <[email protected]>
Signed-off-by: abrar <[email protected]>
Co-authored-by: Abrar Sheikh <[email protected]>
Co-authored-by: Cindy Zhang <[email protected]>
Co-authored-by: abrar <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community serve Ray Serve Related Issue unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants