-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Description
getNodeConditionPredicate() in plugin/pkg/scheduler/factory/factory.go makes our code hard to understand because it hides the node-condition-based filtering in the node lister, which is totally non-obvious.
We should get rid of this function and have NodeController add NoSchedule taints for these situations instead. (Alas, we might not actually be able to get rid of getNodeConditionPredicate() completely until we get rid of the Unschedulable field of PodSpec, but at least we can get rid of all the other code in this function.) If there are pods that should still be able to schedule in any of these situations (e.g. DaemonSet pods?) we should add tolerations in admission control for them (e.g. see pkg/controller/daemon/daemoncontroller.go).
cc/ @kubernetes/sig-scheduling-misc @kubernetes/sig-cluster-lifecycle-misc
cc/ @gmarek @kevin-wangzefeng
Sub-tasks according to the design doc:
- Add node taints label and feature flag (Task 0: Added node taints labels and feature flags #49547)
- In the node controller, taint Nodes according to the Node Condition (Task 1: Tainted node by condition. #49257)
- In DaemonSet, update DaemonSetController to Tolerant new Taints (Task 2: Added toleration to DaemonSet pods for node condition taints #50186)
- In admissionController, add MemoryPressure/DiskPressure toleration for no BestEffort pod (Task 3: Add MemoryPressure toleration for no BestEffort pod. #50180)
- In scheduler, disable the current behaviour of filtering out Nodes. Instead, pods will not be scheduled to tainted nodes if a toleration does not exist in its PodSpec (Task 4: Ignored node condition predicates if TaintsByCondition enabled. #50185)
- Add e2e test for this feature (Apply algorithm in scheduler by feature gates. #52723)
- Update doc for this feature (Add documentation for TaintNodesByCondition website#5352)