Taint nodes when termination notification is detected

Similar to how [`cluster-autoscaler` uses taints when marking nodes for scale down](https://github.com/kubernetes/autoscaler/blob/912d923484b826b6986046405d243f9083ceb764/cluster-autoscaler/utils/deletetaint/delete.go#L35-L36), `aws-node-termination-handler` should taint nodes which will be terminated, _in addition_ to everything it does currently. This issue is slightly similar to #123 but the proposal is to add to the current behavior.

The reason for this is to make it possible to detect programmatically that a spot node becoming unschedulable in k8s is due to termination vs due to cluster issues. For example, when using `prometheus-operator`, default `KubeNodeUnreachable` alert looks like this:

```
kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="node.kubernetes.io/unreachable"}
```

which often misfires for nodes being scaled down by the cluster-autoscaler. To fix the problem, we can drop nodes with a taint:

```
kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="node.kubernetes.io/unreachable"} unless on(node) kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="ToBeDeletedByClusterAutoscaler"}
```

If `aws-node-termination-handler` tainted nodes that taint could be incorporated into the alert as well.

I'm happy to PR this if there are no objections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Taint nodes when termination notification is detected #160

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Taint nodes when termination notification is detected #160

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions