Skip to content

Conversation

@Sithembiso-Mashinini
Copy link
Contributor

@Sithembiso-Mashinini Sithembiso-Mashinini commented Oct 22, 2025

Issue #, if available:

Description of changes:

Fixes node tainting failures when UseProviderId is enabled and Kubernetes node names differ from AWS PrivateDnsNames (e.g., clusters using instance IDs as node names).

PreDrainTask functions in SQS event handlers used interruptionEvent.NodeName (AWS PrivateDnsName) directly, causing tainting to fail when the Kubernetes node name didn't match. Drain/cordon operations worked because they resolved the correct node name via provider ID lookup

Example Failure

  {"level":"info","node-name":"ip-10-2-143-153.ec2.internal","instance-id":"i-0457d1e6522ad899c","provider-id":"aws:///us-east-1d/i-0457d1e6522ad899c","message":"Requesting instance drain"}
  {"level":"debug","target_provider_id":"aws:///us-east-1d/i-0457d1e6522ad899c","message":"Looking up node by ProviderID"}
  {"level":"debug","node_name":"i-0457d1e6522ad899c","node_provider_id":"aws:///us-east-1d/i-0457d1e6522ad899c","match":true,"message":"Checking node"}
  {"level":"debug","found_node":"i-0457d1e6522ad899c","message":"Returning node name"}
  {"level":"warn","node_name":"ip-10-2-143-153.ec2.internal","label_selector":"kubernetes.io/hostname in (ip-10-2-143-153,ip-10-2-143-153.ec2.internal)","matching_nodes":0,"message":"No nodes found with label selector"}
  {"level":"error","error":"nodes ip-10-2-143-153.ec2.internal not found","node_name":"ip-10-2-143-153.ec2.internal","message":"Failed to get node directly"}
  {"level":"error","error":"Unable to fetch kubernetes node from API: nodes ip-10-2-143-153.ec2.internal not found","message":"unable to taint node"}
  {"level":"info","node_name":"i-0457d1e6522ad899c","message":"Draining the node"}  // Works fine
  {"level":"info","node_name":"i-0457d1e6522ad899c","message":"Node successfully cordoned and drained"}  // Works fine

How you tested your changes:
Environment Linux:
Kubernetes Version: v1.31.5

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Sithembiso-Mashinini Sithembiso-Mashinini requested a review from a team as a code owner October 22, 2025 12:07
@Sithembiso-Mashinini
Copy link
Contributor Author

@tiationg-kho nudge on this 🙏

@Sithembiso-Mashinini
Copy link
Contributor Author

@LikithaVemulapalli this PR correctly addresses #857. With --use-provider-id=true, we should resolve the node via .spec.providerID and use that Kubernetes node name—not slice PrivateDnsHostname from the event. This matches the docs and covers clusters where the node name is the instance ID (our setup too).

@tiationg-kho tiationg-kho self-requested a review November 10, 2025 18:54
Copy link
Contributor

@tiationg-kho tiationg-kho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tiationg-kho tiationg-kho merged commit 8c84d84 into aws:main Nov 17, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants