Skip to content

Commit 33183fc

Browse files
committed
Draft borrowing between priority levels in APF
1 parent 8c1a7ac commit 33183fc

File tree

1 file changed

+104
-19
lines changed
  • keps/sig-api-machinery/1040-priority-and-fairness

1 file changed

+104
-19
lines changed

keps/sig-api-machinery/1040-priority-and-fairness/README.md

Lines changed: 104 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -283,8 +283,10 @@ In short, this proposal is about generalizing the existing
283283
max-in-flight request handler in apiservers to add more discriminating
284284
handling of requests. The overall approach is that each request is
285285
categorized to a priority level and a queue within that priority
286-
level; each priority level dispatches to its own isolated concurrency
287-
pool; within each priority level queues compete with even fairness.
286+
level; each priority level dispatches to its own concurrency pool and,
287+
according to a configured limit, unused concurrency borrrowed from
288+
lower priority levels; within each priority level queues compete with
289+
even fairness.
288290

289291
### Request Categorization
290292

@@ -638,24 +640,107 @@ always dispatched immediately. Following is how the other requests
638640
are dispatched at a given apiserver.
639641

640642
The concurrency limit of an apiserver is divided among the non-exempt
641-
priority levels in proportion to their assured concurrency shares.
642-
This produces the assured concurrency value (ACV) for each non-exempt
643-
priority level:
643+
priority levels, and higher ones can do a limited amount of borrowing
644+
from lower ones.
645+
646+
Non-exempt priority levels are ordered in a total order for the
647+
purpose of borrowing currently-unused concurrency. The ordering is
648+
based (first) on increasing value of a spec field whose name is
649+
`priority` and whose value is constrained to lie in the range 0
650+
through 10000 inclusive and (second) on increasing name of the
651+
PriorityLevelConfiguration object. Priority levels that appear later
652+
in the order are considered to have lower priority.
653+
654+
Borrowing is done on the basis of the current situation, with no
655+
consideration of opportunity cost and no pre-emption when the
656+
situation changes.
657+
658+
An apiserver assigns three concurrency limits to each non-exempt
659+
priority level, with a constraint that means there are only two
660+
degrees of freedom.
661+
662+
- The ***LendableConcurrencyLimit*** is the number of seats that are
663+
statically (i.e., before any borrowing takes place) assigned to this
664+
level and can be dynamically borrowed by higher levels.
665+
- The ***NonLendableConcurrencyLimit*** is the number of seats that
666+
are statically assigned to this level and can _not_ be borrowed by
667+
higher levels.
668+
- The ***NominalConcurrencyLimit*** is the number of seats statically
669+
assigned to this level and is the sum of the
670+
LendableConcurrencyLimit and the NonLendableConcurrencyLimit.
671+
672+
Each non-exempt PriorityLevelConfiguration's spec has an
673+
`assuredConcurrencyShares`, which has existed since APF was introduced
674+
and may not be zero, and a `lendableConcurrencyShares` field, which is
675+
being added in the midst of the lifetime of the `v1beta2` version of
676+
the API and may be any value between zero and
677+
`assuredConcurrencyShares` inclusive (default is zero). Each
678+
apiserver allocates NominalConcurrencyLimits in proportion to
679+
`assuredConcurrencyShares` and LendableConcurrncyLimit in
680+
corresponding propotion:
681+
682+
```
683+
NominalConcurrencyLimit(i) = ceil( SCL * assuredConcurrencyShares(i) / sum_assured )
684+
LendableConcurrencyLimit(i) = ceil( SCL * lendableConcurrencyShares(i) / sum_assured )
685+
NonLendableConcurrencyLimit(i) = NominalConcurrencyLimit(i) - LendableConcurrencyLimit(i)
686+
sum_assured = sum[priority levels k] assuredConcurrencyShares(k)
687+
```
688+
689+
where SCL is the apiserver's concurrency limit.
690+
691+
Borrowing is further limited by a practical consideration: we do not
692+
want a global mutex covering all dispatching. Aside from borrowing,
693+
dispatching from one priority level is done independently from
694+
dispatching at another. Borrowing is allowed in just one direction
695+
(higher may borrow from lower, and lower does not actively lend to
696+
higher) and in a very limited quantity: an attempt to dispatch for one
697+
priority level will consider borrowing from just one other priority
698+
level (if there is any lower priority level at all).
699+
700+
A request can be dispatched exactly at a non-exempt priority level
701+
when either there are no requests executing at that priority level or
702+
the number of seats needed by that request is no greater than the
703+
number of unused seats at that priority level. The number of unused
704+
seats at a given priority level is that level's
705+
NominalConcurrencyLimit minus the number of seats used by requests
706+
executing at that priority level (dispatched from that priority level
707+
and higher ones).
708+
709+
There are two sorts of times when dispatching to a non-empty priority
710+
level is considered: when a request arrives, and when a request
711+
releases the seats it was occupying (which is not the same as when the
712+
request finishes from the client's point of view, see below about
713+
WATCH requests).
714+
715+
At each of these sorts of moments, as many requests are dispatched
716+
exactly at the same priority level as possible. The next request to
717+
consider dispatching is chosen by using the Fair Queuing for Server
718+
Requests algorithm below to choose a queue at that priority level, and
719+
the request at the head of that queue is considered. If (a) no
720+
requests can be dispatched exactly at that priority level at that
721+
moment, (b) there are non-empty queues at that level, and (c) there
722+
are lower non-exempt priority levels, then the request at the head of
723+
the chosen queue is considered for dispatch at one of the lower
724+
priority levels. The particular lower priority level considered is
725+
drawn at random from the lower ones, in proportion to their
726+
LendableConcurrencyLimit (we use a static value so that the drawing
727+
can be done without acquiring mutexes). The request is executed at
728+
the chosen lower level (occupying some of its seats) if the request
729+
can be dispatched exactly at that level according to the rule above.
730+
731+
The following table shows the current default non-exempt priority
732+
levels and a proposal for their new configuration.
733+
734+
| Name | Assured Shares | Proposed Lendable Shares | Proposed Priority |
735+
| ---- | -------------: | -----------------------: | ----------------: |
736+
| leader-election | 10 | 0 | 200 |
737+
| node-high | 40 | 10 | 400 |
738+
| system | 30 | 10 | 600 |
739+
| workload-high | 40 | 20 | 1000 |
740+
| workload-low | 100 | 90 | 8000 |
741+
| global-default | 20 | 10 | 9000 |
742+
| catch-all | 5 | 0 | 10000 |
644743

645-
```
646-
ACV(l) = ceil( SCL * ACS(l) / ( sum[priority levels k] ACS(k) ) )
647-
```
648-
649-
where SCL is the apiserver's concurrency limit and ACS(l) is the
650-
AssuredConcurrencyShares for priority level l.
651-
652-
Dispatching is done independently for each priority level. Whenever
653-
(1) a non-exempt priority level's number of running requests is zero
654-
or below the level's assured concurrency value and (2) that priority
655-
level has a non-empty queue, it is time to dispatch another request
656-
for service. The Fair Queuing for Server Requests algorithm below is
657-
used to pick a non-empty queue at that priority level. Then the
658-
request at the head of that queue is dispatched.
659744

660745
### Fair Queuing for Server Requests
661746

0 commit comments

Comments
 (0)