@@ -643,59 +643,97 @@ The concurrency limit of an apiserver is divided among the non-exempt
643643priority levels, and higher ones can do a limited amount of borrowing
644644from lower ones.
645645
646- Non-exempt priority levels are ordered in a total order for the
647- purpose of borrowing currently-unused concurrency. The ordering is
648- based (first) on increasing value of a spec field whose name is
649- ` priority ` and whose value is constrained to lie in the range 0
650- through 10000 inclusive and (second) on increasing name of the
651- PriorityLevelConfiguration object. Priority levels that appear later
652- in the order are considered to have lower priority.
646+ Two fields of ` LimitedPriorityLevelConfiguration ` , introduced in the
647+ midst of the ` v1beta2 ` lifetime, configure the borrowing. The following
648+ display shows the two new fields along with the updated description for
649+ the ` AssuredConcurrencyShares ` field.
650+
651+ ``` go
652+ type LimitedPriorityLevelConfiguration struct {
653+ ...
654+ // `assuredConcurrencyShares` (ACS) contributes to the computation of the
655+ // NominalConcurrencyLimit (NCL) of this level.
656+ // This is the number of execution seats available at this priority level.
657+ // This is used both for requests dispatched from
658+ // this priority level as well as requests dispatched from higher priority
659+ // levels borrowing seats from this level. This does not limit dispatching from
660+ // this priority level that borrows seats from lower priority levels (those lower
661+ // levels do that). The server's concurrency limit (SCL) is divided among the
662+ // Limited priority levels in proportion to their ACS values:
663+ //
664+ // NCL(i) = ceil( SCL * ACS(i) / sum_acs )
665+ // sum_acs = sum[limited priority level k] ACS(k)
666+ //
667+ // Bigger numbers mean a larger nominal concurrency limit, at the expense
668+ // of every other Limited priority level.
669+ // This field has a default value of 30.
670+ // +optional
671+ AssuredConcurrencyShares int32
672+
673+ // `lendableConcurrencyShares` (LCS) contributes to the computation of the
674+ // LendableConcurrencyLimit (LCL) for this level. This is the number of
675+ // execution seats of this level that can be borrowed by higher priority
676+ // Limited levels.
677+ // This may not be negative, and may not be greater than
678+ // `assuredConcurrencyShares`.
679+ //
680+ // LCL(i) = ceil( SCL * LCS(i) / sum_acs )
681+ //
682+ // This field has a default value of zero.
683+ // +optional
684+ LendableConcurrencyShares int32
685+
686+ // `priority` determines where this priority level appears in the total order
687+ // of Limited priority levels used to configure borrowing between those levels.
688+ // A numerically higher value means a logically lower priority.
689+ // Do not create ties; they will be broken arbitrarily.
690+ // `priority` SHOULD be a positive number no greater than 10000.
691+ // If it is zero then, for the sake of a smooth transition from the time
692+ // before this field existed, this level will be treated as if its `priority`
693+ // is the average of the `matchingPrecedence` of the FlowSchema objects
694+ // that reference this level.
695+ // +optional
696+ Priority int32
697+ }
698+ ```
699+
700+ This is a somewhat tortured meaning for "assured", but it is the
701+ meaning we need for introduction of the new field to the existing type
702+ while having a smooth transition in behavior. In the next version we
703+ should rename the ` AssuredConcurrencyShares ` to
704+ ` NominalConcurrencyShares ` .
653705
654706Borrowing is done on the basis of the current situation, with no
655- consideration of opportunity cost and no pre-emption when the
656- situation changes.
657-
658- An apiserver assigns three concurrency limits to each non-exempt
659- priority level, with a constraint that means there are only two
660- degrees of freedom.
661-
662- - The *** LendableConcurrencyLimit*** is the number of seats that are
663- statically (i.e., before any borrowing takes place) assigned to this
664- level and can be dynamically borrowed by higher levels.
665- - The *** NonLendableConcurrencyLimit*** is the number of seats that
666- are statically assigned to this level and can _ not_ be borrowed by
667- higher levels.
668- - The *** NominalConcurrencyLimit*** is the number of seats statically
669- assigned to this level and is the sum of the
670- LendableConcurrencyLimit and the NonLendableConcurrencyLimit.
671-
672- Each non-exempt PriorityLevelConfiguration's spec has an
673- ` assuredConcurrencyShares ` , which has existed since APF was introduced
674- and may not be zero, and a ` lendableConcurrencyShares ` field, which is
675- being added in the midst of the lifetime of the ` v1beta2 ` version of
676- the API and may be any value between zero and
677- ` assuredConcurrencyShares ` inclusive (default is zero). Each
678- apiserver allocates NominalConcurrencyLimits in proportion to
679- ` assuredConcurrencyShares ` and LendableConcurrncyLimit in
680- corresponding propotion:
681-
682- ```
683- NominalConcurrencyLimit(i) = ceil( SCL * assuredConcurrencyShares(i) / sum_assured )
684- LendableConcurrencyLimit(i) = ceil( SCL * lendableConcurrencyShares(i) / sum_assured )
685- NonLendableConcurrencyLimit(i) = NominalConcurrencyLimit(i) - LendableConcurrencyLimit(i)
686- sum_assured = sum[priority levels k] assuredConcurrencyShares(k)
687- ```
688-
689- where SCL is the apiserver's concurrency limit.
707+ consideration of opportunity cost, no further rationing according to
708+ shares (just obeying the concurrency limits as outlined above), and no
709+ pre-emption when the situation changes.
710+
711+ Whenever a request is dispatched, it takes all its seats from one
712+ priority level --- either the one referenced by the request's
713+ FlowSchema or a lower priority level.
690714
691715Borrowing is further limited by a practical consideration: we do not
692716want a global mutex covering all dispatching. Aside from borrowing,
693717dispatching from one priority level is done independently from
694- dispatching at another. Borrowing is allowed in just one direction
695- (higher may borrow from lower, and lower does not actively lend to
696- higher) and in a very limited quantity: an attempt to dispatch for one
697- priority level will consider borrowing from just one other priority
698- level (if there is any lower priority level at all).
718+ dispatching at another. There _ is_ a global mutex held by the logic
719+ that digests configuration objects, but it produces an immutable
720+ object that is passed through ` sync/atomic.Store ` and ` .Load ` to the
721+ logic that does queuing and dispatching. Borrowing is allowed in just
722+ one direction: higher may borrow from lower. Furthermore, a higher
723+ priority level may actively borrow from a lower one but a lower level
724+ does not actively lend to a higher one (see below). In this design,
725+ working on one request requires holding at most two priority level
726+ mutexes at any given moment. To avoid deadlock, there is a strict
727+ ordering on acquisition of those mutexes, namely decreasing logical
728+ priority.
729+
730+ We could take a complementary approach, in which lower actively lends
731+ to higher but higher does not actively borrow from lower. The chosen
732+ direction was chosen based on looking at metrics from scalabiility
733+ tests showing that the ` workload-low ` priority level has a
734+ significantly larger NominalConcurrencyLimit than the other levels and
735+ is usually very under-utilized (so it would consider lending less
736+ often than higher levels would consider borrowing).
699737
700738A request can be dispatched exactly at a non-exempt priority level
701739when either there are no requests executing at that priority level or
@@ -706,39 +744,47 @@ NominalConcurrencyLimit minus the number of seats used by requests
706744executing at that priority level (dispatched from that priority level
707745and higher ones).
708746
709- There are two sorts of times when dispatching to a non-empty priority
747+ There are two sorts of times when dispatching to a non-exempt priority
710748level is considered: when a request arrives, and when a request
711749releases the seats it was occupying (which is not the same as when the
712- request finishes from the client's point of view, see below about
713- WATCH requests).
750+ request finishes from the client's point of view, due to the special
751+ considerations for WATCH requests).
714752
715753At each of these sorts of moments, as many requests are dispatched
716754exactly at the same priority level as possible. The next request to
717- consider dispatching is chosen by using the Fair Queuing for Server
718- Requests algorithm below to choose a queue at that priority level, and
719- the request at the head of that queue is considered. If (a) no
720- requests can be dispatched exactly at that priority level at that
721- moment, (b) there are non-empty queues at that level, and (c) there
722- are lower non-exempt priority levels, then the request at the head of
723- the chosen queue is considered for dispatch at one of the lower
724- priority levels. The particular lower priority level considered is
725- drawn at random from the lower ones, in proportion to their
726- LendableConcurrencyLimit (we use a static value so that the drawing
727- can be done without acquiring mutexes). The request is executed at
728- the chosen lower level (occupying some of its seats) if the request
729- can be dispatched exactly at that level according to the rule above.
755+ consider in this process is chosen by using the Fair Queuing for
756+ Server Requests algorithm below to choose one of the non-empty queues
757+ at that priority level, and if indeed there is a non-empty queue then
758+ the request at the head of the chosen queue is considered for
759+ dispatching. If the level has non-empty queues but the chosen request
760+ can not be dispatched exactly at this level at the moment then the
761+ logically lower non-exempt priority levels are considered, one at a
762+ time, in decreasing logical priority order. As soon as one is found
763+ at which the request can be dispatched at the moment according to the
764+ rule above then the search stops and the request is dispatched to
765+ execute using some of the lower priority level's seats. If no
766+ suitable priority level is found then the request is not dispatched at
767+ the moment.
768+
769+ As can be seen from this logic, when seats are freed up at a given
770+ priority level they are _ not_ actively lent to logically higher
771+ priority levels. We avoid that in order to have a total order in
772+ which priority level mutexes are acquired.
730773
731774The following table shows the current default non-exempt priority
732- levels and a proposal for their new configuration.
775+ levels and a proposal for their new configuration. For the sake of
776+ continuity with out-of-tree configuration objects, the proposed
777+ priority values follow the rule given above for the effective value
778+ when the priority field holds zero.
733779
734780| Name | Assured Shares | Proposed Lendable Shares | Proposed Priority |
735781| ---- | -------------: | -----------------------: | ----------------: |
736- | leader-election | 10 | 0 | 200 |
782+ | leader-election | 10 | 0 | 150 |
737783| node-high | 40 | 10 | 400 |
738- | system | 30 | 10 | 600 |
739- | workload-high | 40 | 20 | 1000 |
740- | workload-low | 100 | 90 | 8000 |
741- | global-default | 20 | 10 | 9000 |
784+ | system | 30 | 10 | 500 |
785+ | workload-high | 40 | 20 | 833 |
786+ | workload-low | 100 | 90 | 9000 |
787+ | global-default | 20 | 10 | 9900 |
742788| catch-all | 5 | 0 | 10000 |
743789
744790
0 commit comments