Skip to content

Conversation

@davinci26
Copy link
Contributor

@davinci26 davinci26 commented Nov 1, 2025

Implements slow start functionality as described in gRFC A100

I still need to do some work on tests and do an e2e experiment with xDS to ensure that things are working as expected, but the core implementation is in a state that can be reviewed.

At this phase I'm mainly looking for feedback on the overall approach and design and in the test code because I am worried that I might be missing some test utility that would make testing with time easier.

Pending Items before this is ready to be merged:

  • Add more unit tests for slow start config cases
  • Manually verify with an xDS experiment that slow start is working as expected

RELEASE NOTES:

  • weightedroundrobin: Implements slow start functionality as described in gRFC A100

Implements slow start functionality as described in
[gRFC A100](grpc/proposal#498)

I still need to do some work on tests and do an e2e experiment with xDS to
ensure that things are working as expected, but the core implementation is
in a state that can be reviewed.

At this phase I'm mainly looking for feedback on the overall approach and design and
in the test code because I am worried that I might be missing some test utility that would
make testing with `time` easier.

Pending Items before this is ready to be merged:
- [ ] Add more unit tests for slow start config cases
- [ ] Manually verify with an xDS experiment that slow start is working as expected

Release Notes
* weightedroundrobin: Implements slow start functionality as described in
  [gRFC A100](grpc/proposal#498)

Signed-off-by: sotiris <[email protected]>
@codecov
Copy link

codecov bot commented Nov 1, 2025

Codecov Report

❌ Patch coverage is 64.28571% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.32%. Comparing base (363018c) to head (e8884a6).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
balancer/weightedroundrobin/balancer.go 72.97% 5 Missing and 5 partials ⚠️
...xds/xdsclient/xdslbregistry/converter/converter.go 0.00% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8689      +/-   ##
==========================================
- Coverage   83.43%   83.32%   -0.12%     
==========================================
  Files         415      416       +1     
  Lines       32195    32342     +147     
==========================================
+ Hits        26863    26949      +86     
- Misses       3980     4016      +36     
- Partials     1352     1377      +25     
Files with missing lines Coverage Δ
...xds/xdsclient/xdslbregistry/converter/converter.go 70.00% <0.00%> (-3.34%) ⬇️
balancer/weightedroundrobin/balancer.go 83.94% <72.97%> (-1.61%) ⬇️

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

go.mod Outdated
github.com/envoyproxy/go-control-plane/envoy v1.35.0
github.com/cncf/xds/go v0.0.0-20251014123835-2ee22ca58382
github.com/envoyproxy/go-control-plane v0.13.4
github.com/envoyproxy/go-control-plane/envoy v1.35.1-0.20251015221300-4138018a492b
Copy link
Contributor Author

@davinci26 davinci26 Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports a commit because the proto is not in a release yet, validate this is the correct version for all deps

Signed-off-by: sotiris <[email protected]>
Signed-off-by: sotiris <[email protected]>
@davinci26
Copy link
Contributor Author

Don't want to commit this code because it introduces a new dependencies but wanted to make sure the aggression math is reasonable so I plotted them over time (https://gist.github.com/davinci26/d29bf9633fe9afd3b4164656a8013eb4)

slow_start_aggression_plot

(Note the flatline at start is due to math.Max(timeSinceReady.Seconds(), 1.0))

Signed-off-by: sotiris <[email protected]>
Signed-off-by: sotiris <[email protected]>
Signed-off-by: sotiris <[email protected]>
@davinci26 davinci26 changed the title weightedroundrobin: Implements slow start [WiP] weightedroundrobin: Implements Slow Start (gRFC A100) [WiP] Nov 3, 2025
if slowStartConfigCfg := cswrrProto.GetSlowStartConfig(); slowStartConfigCfg != nil {
wrrLBCfg.SlowStartConfig = &weightedroundrobin.SlowStartConfig{
SlowStartWindow: internalserviceconfig.Duration(slowStartConfigCfg.GetSlowStartWindow().AsDuration()),
// SlowStartConfig uses a runtime value in the proto definition so we need to get either the default value or the runtime value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to call some attention here since this is a bit of subtle point, where I made a decision but maybe its worth clarifying in the gRFC

Signed-off-by: sotiris <[email protected]>
ew.stopORCAListener()
}
ew.mu.Lock()
ew.readySince = time.Time{} // Reset readySince when going non-READY.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation should be adjusted based on the convo grpc/proposal#498 (comment)

@easwars easwars self-assigned this Nov 13, 2025
@easwars easwars added the Type: Feature New features or improvements in behavior label Nov 13, 2025
@easwars
Copy link
Contributor

easwars commented Nov 13, 2025

Apologies for the delay in reviewing this PR. I'm starting on this today and hopefully will have comments soon. Thanks.

@easwars easwars added this to the 1.78 Release milestone Nov 17, 2025
@easwars
Copy link
Contributor

easwars commented Nov 18, 2025

@davinci26 : I finally got around to reading the gRFC and I'm ready to start reviewing this. But could you please split this up into smaller PRs. The minimum I would like to see is the following split

  • WRR changes alone
  • Add metrics to WRR
  • xDS integration

I'm going to start reviewing the WRR changes alone here. So, if you could move the other pieces out, we can keep the changes small and have a more meaningful review. Thanks again for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants