Skip to content

Conversation

@MikeSpreitzer
Copy link
Member

@MikeSpreitzer MikeSpreitzer commented Jun 13, 2022

  • One-line PR description: describe how to add borrowing to API Priority and Fairness via dynamically adjusted concurrency limits
  • Issue link:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 13, 2022
@k8s-ci-robot k8s-ci-robot requested a review from deads2k June 13, 2022 20:16
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jun 13, 2022
@k8s-ci-robot k8s-ci-robot requested a review from fedebongio June 13, 2022 20:16
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jun 13, 2022
@MikeSpreitzer
Copy link
Member Author

/cc @wojtek-t
/cc @lavalamp
@tkashem

@MikeSpreitzer MikeSpreitzer force-pushed the apf-add-borrowing-adjustment branch from a92ba9d to 41cf392 Compare June 16, 2022 04:29
@MikeSpreitzer MikeSpreitzer force-pushed the apf-add-borrowing-adjustment branch from 41cf392 to fbedda1 Compare June 16, 2022 04:34
the inputs to the adjustment logic in order to dampen control
gyrations, in a way that lets a priority level reclaim lent seats at
the nearest adjustment time. The adjustments take into account the
high watermark `HighSD(i)`, time-weighted average `AvgSD(i)`, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

letters are cheap. SD is SeatDepth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SeatDemand"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded all the "SD" in the latest revision.

EnvelopeSD when it exceeds SmoothSD. The rule for updating priority
level `i`'s SmoothSD at the end of an adjustment period is
`SmoothSD(i) := max( EnvelopeSD(i), A*SmoothSD(i) + (1-A)*Envelope(i)
)`. The command line flag `--seat-demand-history-fraction` with a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a cluster-admin doesn't seem likely to choose a better A than we will

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what default values are for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I am not sure what value to use. I made a rough guess, but it is only that.

```
CurrentCL(i) = FairFrac * Target(i) if FairFrac * Target(i) >= MinCurrentCL(i)
CurrentCL(i) = MinCurrentCL(i) if FairFrac * Target(i) <= MinCurrentCL(i)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, are guaranteed that CurrentCLSum (sum of all CurrentCL(i)) is equal to ServerCL? I'm not seeing how that is guaranteed.

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that FairFrac is a variable here that has to be solved for. Higher FairFrac makes for higher sum of CurrentCL; the problem is to find the value of FairFrac that makes the CurrentCL sum to ServerCL. This is analogous to the max-min fairness problem, which is to pick a fair allocation that applies to all the requesters that want at least that much (the other requesters get only what they ask for, and the sum of allocations is limited by the available capacity).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised the write-up to go through this a little more slowly, hopefully it is clearer now (if a bit longer).

@deads2k
Copy link
Contributor

deads2k commented Jun 16, 2022

This design is more complicated, but with the metrics listed at the bottom I think we have enough to explain why certain requests got 429s. I'm out next week, so I'll leave lgtm to @wojtek-t

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 16, 2022
@MikeSpreitzer
Copy link
Member Author

BTW, @deads2k , what is your position on whether to keep the new Priority field?

@MikeSpreitzer MikeSpreitzer force-pushed the apf-add-borrowing-adjustment branch from 2c0b127 to b2017df Compare June 17, 2022 04:31
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this PR modulo getting rid of priority field which we're not using now.

The following display shows the two new fields along with the updated
description for the `AssuredConcurrencyShares` field, in `v1beta2`.

**Note**: currently this design does not use the `Priority` field for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of this field now.

If it appears needed, we can always add that later, but until needed, let's remove it to avoid confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a revision that removes the Priority field.

administrators of the cluster have scripts/clients that maintain some
out-of-tree PriorityLevelConfiguration objects; these, naturally, do
not specify a value for the `priority` field. The default behavior
for `priority` is designed to do something more natural and convenient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't up-to-date I think, as we don't use priority field at all.

Remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I deleted it as part of removing the Priority field.

Also updated outdated text about directed borrowing.
@wojtek-t
Copy link
Member

/label tide/merge-method-squash

/lgtm
/approve

We should probably also add that to the tracking sheet:
https://docs.google.com/spreadsheets/d/1Lnft8598eIQsqBp8W6X_LwaqBNZViYssQoFgGS8aJ3g/edit?pli=1#gid=0

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 20, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 20, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, MikeSpreitzer, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ae30fc8 into kubernetes:master Jun 20, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Jun 20, 2022
@MikeSpreitzer MikeSpreitzer deleted the apf-add-borrowing-adjustment branch August 26, 2022 19:22
@MikeSpreitzer MikeSpreitzer changed the title [Draft] KEP-1040: periodically adjusted borrowing between priority levels in APF KEP-1040: periodically adjusted borrowing between priority levels in APF Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants