Skip to content

Taint nodes when termination notification is detected #160

@diversario

Description

@diversario

Similar to how cluster-autoscaler uses taints when marking nodes for scale down, aws-node-termination-handler should taint nodes which will be terminated, in addition to everything it does currently. This issue is slightly similar to #123 but the proposal is to add to the current behavior.

The reason for this is to make it possible to detect programmatically that a spot node becoming unschedulable in k8s is due to termination vs due to cluster issues. For example, when using prometheus-operator, default KubeNodeUnreachable alert looks like this:

kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="node.kubernetes.io/unreachable"}

which often misfires for nodes being scaled down by the cluster-autoscaler. To fix the problem, we can drop nodes with a taint:

kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="node.kubernetes.io/unreachable"} unless on(node) kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="ToBeDeletedByClusterAutoscaler"}

If aws-node-termination-handler tainted nodes that taint could be incorporated into the alert as well.

I'm happy to PR this if there are no objections.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions