-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-1040: periodically adjusted borrowing between priority levels in APF #3391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1040: periodically adjusted borrowing between priority levels in APF #3391
Conversation
a92ba9d to
41cf392
Compare
41cf392 to
fbedda1
Compare
| the inputs to the adjustment logic in order to dampen control | ||
| gyrations, in a way that lets a priority level reclaim lent seats at | ||
| the nearest adjustment time. The adjustments take into account the | ||
| high watermark `HighSD(i)`, time-weighted average `AvgSD(i)`, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
letters are cheap. SD is SeatDepth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"SeatDemand"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded all the "SD" in the latest revision.
| EnvelopeSD when it exceeds SmoothSD. The rule for updating priority | ||
| level `i`'s SmoothSD at the end of an adjustment period is | ||
| `SmoothSD(i) := max( EnvelopeSD(i), A*SmoothSD(i) + (1-A)*Envelope(i) | ||
| )`. The command line flag `--seat-demand-history-fraction` with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a cluster-admin doesn't seem likely to choose a better A than we will
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what default values are for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I am not sure what value to use. I made a rough guess, but it is only that.
| ``` | ||
| CurrentCL(i) = FairFrac * Target(i) if FairFrac * Target(i) >= MinCurrentCL(i) | ||
| CurrentCL(i) = MinCurrentCL(i) if FairFrac * Target(i) <= MinCurrentCL(i) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, are guaranteed that CurrentCLSum (sum of all CurrentCL(i)) is equal to ServerCL? I'm not seeing how that is guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that FairFrac is a variable here that has to be solved for. Higher FairFrac makes for higher sum of CurrentCL; the problem is to find the value of FairFrac that makes the CurrentCL sum to ServerCL. This is analogous to the max-min fairness problem, which is to pick a fair allocation that applies to all the requesters that want at least that much (the other requesters get only what they ask for, and the sum of allocations is limited by the available capacity).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised the write-up to go through this a little more slowly, hopefully it is clearer now (if a bit longer).
|
This design is more complicated, but with the metrics listed at the bottom I think we have enough to explain why certain requests got 429s. I'm out next week, so I'll leave lgtm to @wojtek-t /approve |
|
BTW, @deads2k , what is your position on whether to keep the new |
2c0b127 to
b2017df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this PR modulo getting rid of priority field which we're not using now.
| The following display shows the two new fields along with the updated | ||
| description for the `AssuredConcurrencyShares` field, in `v1beta2`. | ||
|
|
||
| **Note**: currently this design does not use the `Priority` field for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get rid of this field now.
If it appears needed, we can always add that later, but until needed, let's remove it to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a revision that removes the Priority field.
| administrators of the cluster have scripts/clients that maintain some | ||
| out-of-tree PriorityLevelConfiguration objects; these, naturally, do | ||
| not specify a value for the `priority` field. The default behavior | ||
| for `priority` is designed to do something more natural and convenient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't up-to-date I think, as we don't use priority field at all.
Remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I deleted it as part of removing the Priority field.
Also updated outdated text about directed borrowing.
|
/label tide/merge-method-squash /lgtm We should probably also add that to the tracking sheet: |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, MikeSpreitzer, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Uh oh!
There was an error while loading. Please reload this page.