@@ -283,8 +283,10 @@ In short, this proposal is about generalizing the existing
283283max-in-flight request handler in apiservers to add more discriminating
284284handling of requests. The overall approach is that each request is
285285categorized to a priority level and a queue within that priority
286- level; each priority level dispatches to its own isolated concurrency
287- pool; within each priority level queues compete with even fairness.
286+ level; each priority level dispatches to its own concurrency pool and,
287+ according to a configured limit, unused concurrency borrrowed from
288+ lower priority levels; within each priority level queues compete with
289+ even fairness.
288290
289291### Request Categorization
290292
@@ -638,24 +640,107 @@ always dispatched immediately. Following is how the other requests
638640are dispatched at a given apiserver.
639641
640642The concurrency limit of an apiserver is divided among the non-exempt
641- priority levels in proportion to their assured concurrency shares.
642- This produces the assured concurrency value (ACV) for each non-exempt
643- priority level:
643+ priority levels, and higher ones can do a limited amount of borrowing
644+ from lower ones.
645+
646+ Non-exempt priority levels are ordered in a total order for the
647+ purpose of borrowing currently-unused concurrency. The ordering is
648+ based (first) on increasing value of a spec field whose name is
649+ ` priority ` and whose value is constrained to lie in the range 0
650+ through 10000 inclusive and (second) on increasing name of the
651+ PriorityLevelConfiguration object. Priority levels that appear later
652+ in the order are considered to have lower priority.
653+
654+ Borrowing is done on the basis of the current situation, with no
655+ consideration of opportunity cost and no pre-emption when the
656+ situation changes.
657+
658+ An apiserver assigns three concurrency limits to each non-exempt
659+ priority level, with a constraint that means there are only two
660+ degrees of freedom.
661+
662+ - The *** LendableConcurrencyLimit*** is the number of seats that are
663+ statically (i.e., before any borrowing takes place) assigned to this
664+ level and can be dynamically borrowed by higher levels.
665+ - The *** NonLendableConcurrencyLimit*** is the number of seats that
666+ are statically assigned to this level and can _ not_ be borrowed by
667+ higher levels.
668+ - The *** NominalConcurrencyLimit*** is the number of seats statically
669+ assigned to this level and is the sum of the
670+ LendableConcurrencyLimit and the NonLendableConcurrencyLimit.
671+
672+ Each non-exempt PriorityLevelConfiguration's spec has an
673+ ` assuredConcurrencyShares ` , which has existed since APF was introduced
674+ and may not be zero, and a ` lendableConcurrencyShares ` field, which is
675+ being added in the midst of the lifetime of the ` v1beta2 ` version of
676+ the API and may be any value between zero and
677+ ` assuredConcurrencyShares ` inclusive (default is zero). Each
678+ apiserver allocates NominalConcurrencyLimits in proportion to
679+ ` assuredConcurrencyShares ` and LendableConcurrncyLimit in
680+ corresponding propotion:
681+
682+ ```
683+ NominalConcurrencyLimit(i) = ceil( SCL * assuredConcurrencyShares(i) / sum_assured )
684+ LendableConcurrencyLimit(i) = ceil( SCL * lendableConcurrencyShares(i) / sum_assured )
685+ NonLendableConcurrencyLimit(i) = NominalConcurrencyLimit(i) - LendableConcurrencyLimit(i)
686+ sum_assured = sum[priority levels k] assuredConcurrencyShares(k)
687+ ```
688+
689+ where SCL is the apiserver's concurrency limit.
690+
691+ Borrowing is further limited by a practical consideration: we do not
692+ want a global mutex covering all dispatching. Aside from borrowing,
693+ dispatching from one priority level is done independently from
694+ dispatching at another. Borrowing is allowed in just one direction
695+ (higher may borrow from lower, and lower does not actively lend to
696+ higher) and in a very limited quantity: an attempt to dispatch for one
697+ priority level will consider borrowing from just one other priority
698+ level (if there is any lower priority level at all).
699+
700+ A request can be dispatched exactly at a non-exempt priority level
701+ when either there are no requests executing at that priority level or
702+ the number of seats needed by that request is no greater than the
703+ number of unused seats at that priority level. The number of unused
704+ seats at a given priority level is that level's
705+ NominalConcurrencyLimit minus the number of seats used by requests
706+ executing at that priority level (dispatched from that priority level
707+ and higher ones).
708+
709+ There are two sorts of times when dispatching to a non-empty priority
710+ level is considered: when a request arrives, and when a request
711+ releases the seats it was occupying (which is not the same as when the
712+ request finishes from the client's point of view, see below about
713+ WATCH requests).
714+
715+ At each of these sorts of moments, as many requests are dispatched
716+ exactly at the same priority level as possible. The next request to
717+ consider dispatching is chosen by using the Fair Queuing for Server
718+ Requests algorithm below to choose a queue at that priority level, and
719+ the request at the head of that queue is considered. If (a) no
720+ requests can be dispatched exactly at that priority level at that
721+ moment, (b) there are non-empty queues at that level, and (c) there
722+ are lower non-exempt priority levels, then the request at the head of
723+ the chosen queue is considered for dispatch at one of the lower
724+ priority levels. The particular lower priority level considered is
725+ drawn at random from the lower ones, in proportion to their
726+ LendableConcurrencyLimit (we use a static value so that the drawing
727+ can be done without acquiring mutexes). The request is executed at
728+ the chosen lower level (occupying some of its seats) if the request
729+ can be dispatched exactly at that level according to the rule above.
730+
731+ The following table shows the current default non-exempt priority
732+ levels and a proposal for their new configuration.
733+
734+ | Name | Assured Shares | Proposed Lendable Shares | Proposed Priority |
735+ | ---- | -------------: | -----------------------: | ----------------: |
736+ | leader-election | 10 | 0 | 200 |
737+ | node-high | 40 | 10 | 400 |
738+ | system | 30 | 10 | 600 |
739+ | workload-high | 40 | 20 | 1000 |
740+ | workload-low | 100 | 90 | 8000 |
741+ | global-default | 20 | 10 | 9000 |
742+ | catch-all | 5 | 0 | 10000 |
644743
645- ```
646- ACV(l) = ceil( SCL * ACS(l) / ( sum[priority levels k] ACS(k) ) )
647- ```
648-
649- where SCL is the apiserver's concurrency limit and ACS(l) is the
650- AssuredConcurrencyShares for priority level l.
651-
652- Dispatching is done independently for each priority level. Whenever
653- (1) a non-exempt priority level's number of running requests is zero
654- or below the level's assured concurrency value and (2) that priority
655- level has a non-empty queue, it is time to dispatch another request
656- for service. The Fair Queuing for Server Requests algorithm below is
657- used to pick a non-empty queue at that priority level. Then the
658- request at the head of that queue is dispatched.
659744
660745### Fair Queuing for Server Requests
661746
0 commit comments