generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 277
Closed
Labels
staleIssues / PRs with no activityIssues / PRs with no activity
Description
Describe the bug
Hi,
In the logs right after the NTH starts we can see errors frequently like below
2022/09/08 08:18:46 ERR Error when trying to list Nodes w/ label, falling back to direct Get lookup of node error="Get \"https://172.20.0.1:443/api/v1/nodes?labelSelector=kubernetes.io%2Fhostname%3D%3Dip-10-45-5-107.eu-central-1.compute.internal\": dial tcp 172.20.0.1:443: i/o timeout"
2022/09/08 08:18:46 WRN All retries failed, unable to complete the uncordon after reboot workflow error="timed out waiting for the condition"
I wanted to understand if this error affects anything.
Steps to reproduce
Expected outcome
No errors
Application Logs
The log output when experiencing the issue.
2022/09/08 08:18:14 INF aws-node-termination-handler arguments:
dry-run: false,
node-name: ip-10-45-5-137.eu-central-1.compute.internal,
pod-name: aws-node-termination-handler-866sr,
metadata-url: http://169.254.169.254,
kubernetes-service-host: 172.20.0.1,
kubernetes-service-port: 443,
delete-local-data: true,
ignore-daemon-sets: true,
pod-termination-grace-period: -1,
node-termination-grace-period: 120,
enable-scheduled-event-draining: true,
enable-spot-interruption-draining: true,
enable-sqs-termination-draining: false,
enable-rebalance-monitoring: true,
enable-rebalance-draining: false,
metadata-tries: 3,
cordon-only: false,
taint-node: true,
taint-effect: NoSchedule,
exclude-from-load-balancers: false,
json-logging: false,
log-level: info,
webhook-proxy: ,
webhook-headers: <not-displayed>,
webhook-url: ,
webhook-template: <not-displayed>,
uptime-from-file: /proc/uptime,
enable-prometheus-server: false,
prometheus-server-port: 9092,
emit-kubernetes-events: false,
kubernetes-events-extra-annotations: ,
aws-region: eu-central-1,
queue-url: ,
check-asg-tag-before-draining: true,
managed-asg-tag: aws-node-termination-handler/managed,
assume-asg-tag-propagation: false,
aws-endpoint: ,
2022/09/08 08:18:44 ERR Error when trying to list Nodes w/ label, falling back to direct Get lookup of node error="Get \"https://172.20.0.1:443/api/v1/nodes?labelSelector=kubernetes.io%2Fhostname%3D%3Dip-10-45-5-137.eu-central-1.compute.internal\": dial tcp 172.20.0.1:443: i/o timeout"
2022/09/08 08:18:44 WRN All retries failed, unable to complete the uncordon after reboot workflow error="timed out waiting for the condition"
2022/09/08 08:18:44 INF Started watching for interruption events
2022/09/08 08:18:44 INF Kubernetes AWS Node Termination Handler has started successfully!
2022/09/08 08:18:44 INF Started watching for event cancellations
2022/09/08 08:18:44 INF Started monitoring for events event_type=SCHEDULED_EVENT
2022/09/08 08:18:44 INF Started monitoring for events event_type=SPOT_ITN
2022/09/08 08:18:44 INF Started monitoring for events event_type=REBALANCE_RECOMMENDATION
2022/09/08 08:48:44 INF event store statistics drainable-events=0 size=0
Environment
- NTH App Version: 1.16.0
- NTH Mode (IMDS/Queue processor): IMDS
- OS/Arch: Linux
- Kubernetes version: 1.21
- Installation method: helm
Metadata
Metadata
Assignees
Labels
staleIssues / PRs with no activityIssues / PRs with no activity