-
Notifications
You must be signed in to change notification settings - Fork 277
Taint nodes on spot and scheduled events #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taint nodes on spot and scheduled events #162
Conversation
Fixes aws#160. Signed-off-by: Ilya Shaisultanov <[email protected]>
|
@bwagner5 I see the build failed but not all of the tests – unsure if related to my changes here or not. Do you know how I could debug those? |
|
@diversario I believe it's because the cluster role does not have taint permissions. https://github.com/aws/aws-node-termination-handler/blob/master/config/helm/aws-node-termination-handler/templates/clusterrole.yaml You can run the e2e tests locally:
or all tests:
Also, looks like some small formatting issues: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add permissions to cluster role and run goimports -w ./ && gofmt -s -w ./
If all the tests except helm-sync passes, I can take the action to fix that. It's just a tedious task of syncing the helm chart to the aws/eks-charts repo.
|
Is it possible for me to see the build logs? I don't think it's a permissions issue because NTH can already add labels to nodes. I checked which looks pretty much like the NTH's one: I'm currently trying to get some view into what's failing from the |
For scheduled event cancellation test, add tolerations to the proxy to allow it to schedule to emit the cancellation event.
Codecov Report
@@ Coverage Diff @@
## master #162 +/- ##
===========================================
- Coverage 94.64% 84.22% -10.43%
===========================================
Files 8 8
Lines 635 748 +113
===========================================
+ Hits 601 630 +29
- Misses 22 99 +77
- Partials 12 19 +7
Continue to review full report at Codecov.
|
| return strings.Replace(value, "/", "~1", -1) | ||
| } | ||
|
|
||
| func addTaint(node *corev1.Node, nth Node, taintKey string, taintValue string, effect corev1.TaintEffect) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, addTaintToSpec and cleanTaint are lifted from cluster-autoscaler with minor modifications.
| --set ec2MetadataTestProxy.enableSpotITN="false" \ | ||
| --set ec2MetadataTestProxy.scheduledEventStatus="canceled" \ | ||
| --set ec2MetadataTestProxy.port="$IMDS_PORT" | ||
| --set ec2MetadataTestProxy.port="$IMDS_PORT" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the worker node is now tainted, the proxy won't schedule there unless it tolerates the taint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work on this! I just had a few comments in-line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @diversario ! 🚀
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #162 +/- ##
===========================================
- Coverage 94.64% 81.37% -13.28%
===========================================
Files 8 8
Lines 635 757 +122
===========================================
+ Hits 601 616 +15
- Misses 22 126 +104
- Partials 12 15 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Issue #, if available: #160
Description of changes:
Upon receiving an event, taint the affect node with the appropriate taint. Remove the taint if event is cancelled or completed.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.