-
Notifications
You must be signed in to change notification settings - Fork 278
Description
Similar to how cluster-autoscaler uses taints when marking nodes for scale down, aws-node-termination-handler should taint nodes which will be terminated, in addition to everything it does currently. This issue is slightly similar to #123 but the proposal is to add to the current behavior.
The reason for this is to make it possible to detect programmatically that a spot node becoming unschedulable in k8s is due to termination vs due to cluster issues. For example, when using prometheus-operator, default KubeNodeUnreachable alert looks like this:
kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="node.kubernetes.io/unreachable"}
which often misfires for nodes being scaled down by the cluster-autoscaler. To fix the problem, we can drop nodes with a taint:
kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="node.kubernetes.io/unreachable"} unless on(node) kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="ToBeDeletedByClusterAutoscaler"}
If aws-node-termination-handler tainted nodes that taint could be incorporated into the alert as well.
I'm happy to PR this if there are no objections.