Skip to content

Unable to parse metadata response: Metadata request received http status code: 401 #229

@benzch

Description

@benzch

Hello,

I recently had retirement event to one of my worker nodes. I noticed this worker isn't draining, I found this error in the logs

2020/08/07 07:32:28 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 401

This error occur every second
Result from worker :

curl 169.254.169.254/latest/meta-data/events/maintenance/scheduled [ { "NotBefore" : "21 Aug 2020 10:00:00 GMT", "Code" : "instance-stop", "Description" : "The instance is running on degraded hardware", "EventId" : "instance-event-07c59a40db2a78c8a", "State" : "active" } ]

tcpdump from host on one of the failed queries:

13:38:04.793607 IP (tos 0x0, ttl 255, id 6105, offset 0, flags [DF], proto TCP (6), length 60)
10.51.144.143.61069 > 169.254.169.254.http: Flags [S], cksum 0xeeed (incorrect -> 0xbc86), seq 4040972178, win 26883, options [mss 8961,sackOK,TS val 1824485981 ecr 0,nop,wscale 7], length 0
E..<[email protected]#
3.........P..W.......i.......#....
l.r]........
13:38:04.793672 IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 60)
169.254.169.254.http > 10.51.144.143.61069: Flags [S.], cksum 0x5176 (correct), seq 1744321961, ack 4040972179, win 27084, options [mss 9040,sackOK,TS val 868454530 ecr 1824485981,nop,wscale 7], length 0
E..<..@.........
3...P..g.=...W...i.Qv....#P...
3...l.r]....
13:38:04.793695 IP (tos 0x0, ttl 255, id 6106, offset 0, flags [DF], proto TCP (6), length 52)
10.51.144.143.61069 > 169.254.169.254.http: Flags [.], cksum 0xeee5 (incorrect -> 0x06d8), seq 4040972179, ack 1744321962, win 211, options [nop,nop,TS val 1824485981 ecr 868454530], length 0
[email protected]*
3.........P..W.g.=............
l.r]3...
13:38:04.793789 IP (tos 0x0, ttl 255, id 6107, offset 0, flags [DF], proto TCP (6), length 277)
10.51.144.143.61069 > 169.254.169.254.http: Flags [P.], cksum 0xefc6 (incorrect -> 0x24c8), seq 4040972179:4040972404, ack 1744321962, win 211, options [nop,nop,TS val 1824485981 ecr 868454530], length 225: HTTP, length: 225
GET /latest/meta-data/events/maintenance/scheduled HTTP/1.1
Host: 169.254.169.254
User-Agent: Go-http-client/1.1
X-Aws-Ec2-Metadata-Token: AQAEABGv962PbO5XIlEOOoj23Y7SimHkM2eLwbTqumqDgSeNUp56Mw==
Accept-Encoding: gzip

[email protected]
3.........P..W.g.=............
l.r]3...GET /latest/meta-data/events/maintenance/scheduled HTTP/1.1
Host: 169.254.169.254
User-Agent: Go-http-client/1.1
X-Aws-Ec2-Metadata-Token: AQAEABGv962PbO5XIlEOOoj23Y7SimHkM2eLwbTqumqDgSeNUp56Mw==
Accept-Encoding: gzip

13:38:04.793920 IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 52)
169.254.169.254.http > 10.51.144.143.61069: Flags [.], cksum 0x05ee (correct), seq 1744321962, ack 4040972404, win 220, options [nop,nop,TS val 868454530 ecr 1824485981], length 0
E..4..@.........
3...P..g.=...Xt...........
3...l.r]
13:38:04.794049 IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 197)
169.254.169.254.http > 10.51.144.143.61069: Flags [P.], cksum 0x120d (correct), seq 1744321962:1744322107, ack 4040972404, win 220, options [nop,nop,TS val 868454530 ecr 1824485981], length 145: HTTP, length: 145
HTTP/1.1 401 Unauthorized
Content-Length: 0
Date: Tue, 11 Aug 2020 13:38:04 GMT
Server: EC2ws
Connection: close
Content-Type: text/plain

[email protected]....
3...P..g.=...Xt...........
3...l.r]HTTP/1.1 401 Unauthorized
Content-Length: 0
Date: Tue, 11 Aug 2020 13:38:04 GMT
Server: EC2ws
Connection: close
Content-Type: text/plain

13:38:04.794056 IP (tos 0x0, ttl 255, id 6108, offset 0, flags [DF], proto TCP (6), length 52)
10.51.144.143.61069 > 169.254.169.254.http: Flags [.], cksum 0xeee5 (incorrect -> 0x055e), seq 4040972404, ack 1744322107, win 219, options [nop,nop,TS val 1824485981 ecr 868454530], length 0
[email protected](
3.........P..Xtg.>;...........
l.r]3...
13:38:04.794071 IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 52)
169.254.169.254.http > 10.51.144.143.61069: Flags [F.], cksum 0x055c (correct), seq 1744322107, ack 4040972404, win 220, options [nop,nop,TS val 868454530 ecr 1824485981], length 0
E..4..@.........
3...P..g.>;..Xt..........
3...l.r]
13:38:04.794112 IP (tos 0x0, ttl 255, id 6109, offset 0, flags [DF], proto TCP (6), length 52)
10.51.144.143.61069 > 169.254.169.254.http: Flags [F.], cksum 0xeee5 (incorrect -> 0x055c), seq 4040972404, ack 1744322108, win 219, options [nop,nop,TS val 1824485981 ecr 868454530], length 0
[email protected]'
3.........P..Xtg.><...........
l.r]3...
13:38:04.794220 IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 52)
169.254.169.254.http > 10.51.144.143.61069: Flags [.], cksum 0x055b (correct), seq 1744322108, ack 4040972405, win 220, options [nop,nop,TS val 868454530 ecr 1824485981], length 0
E..4..@.........
3...P..g.><..Xu.....[.....
3...l.r]

logs from new pods on the same worker (after deletion)

aws-node-termination-handler arguments:
	dry-run: false,
	node-name: ip-10-51-144-143.ec2.internal,
	metadata-url: http://169.254.169.254,
	kubernetes-service-host: 172.20.0.1,
	kubernetes-service-port: 443,
	delete-local-data: true,
	ignore-daemon-sets: true,
	pod-termination-grace-period: -1,
	node-termination-grace-period: 120,
	enable-scheduled-event-draining: true,
	enable-spot-interruption-draining: true,
	metadata-tries: 3,
	cordon-only: false,
	taint-node: false,
	json-logging: false,
	webhook-proxy: ,
	uptime-from-file: ,
	enable-prometheus-server: false,
	prometheus-server-port: 9092,
2020/08/11 13:46:05 Trying to get token from IMDSv2
2020/08/11 13:46:05 Got token from IMDSv2
2020/08/11 13:46:05 Startup Metadata Retrieved: {InstanceID:i-006b30be63e27ea2d InstanceType:m5.8xlarge PublicHostname: PublicIP: LocalHostname:ip-10-51-144-143.ec2.internal LocalIP:10.51.144.143}
ip-10-51-144-143.ec2.internal
2020/08/11 13:46:05 Started watching for interruption events
2020/08/11 13:46:05 Kubernetes AWS Node Termination Handler has started successfully!
2020/08/11 13:46:05 Started watching for event cancellations
2020/08/11 13:46:05 Started monitoring for Spot ITN events
2020/08/11 13:46:05 Started monitoring for Scheduled Maintenance events
2020/08/11 13:46:07 Unable to parse scheduled event end time, continuing: parsing time "" as "2 Jan 2006 15:04:05 GMT": cannot parse "" as "2"
2020/08/11 13:46:07 Sending interruption events to the interruption channel
2020/08/11 13:46:07 Got interruption event from channel {InstanceID:i-006b30be63e27ea2d InstanceType:m5.8xlarge PublicHostname: PublicIP: LocalHostname:ip-10-51-144-143.ec2.internal LocalIP:10.51.144.143} {EventID:instance-event-07c59a40db2a78c8a Kind:SCHEDULED_EVENT Description:instance-stop will occur between 21 Aug 2020 10:00:00 GMT and  because The instance is running on degraded hardware
 State:active StartTime:2020-08-21 10:00:00 +0000 UTC EndTime:2020-08-21 10:00:00 +0000 UTC Drained:false PreDrainTask:0x10cbac0}
2020/08/11 13:46:11 Unable to parse scheduled event end time, continuing: parsing time "" as "2 Jan 2006 15:04:05 GMT": cannot parse "" as "2"
2020/08/11 13:46:11 Sending interruption events to the interruption channel
2020/08/11 13:46:11 Got interruption event from channel {InstanceID:i-006b30be63e27ea2d InstanceType:m5.8xlarge PublicHostname: PublicIP: LocalHostname:ip-10-51-144-143.ec2.internal LocalIP:10.51.144.143} {EventID:instance-event-07c59a40db2a78c8a Kind:SCHEDULED_EVENT Description:instance-stop will occur between 21 Aug 2020 10:00:00 GMT and  because The instance is running on degraded hardware
 State:active StartTime:2020-08-21 10:00:00 +0000 UTC EndTime:2020-08-21 10:00:00 +0000 UTC Drained:false PreDrainTask:0x10cbac0}
2020/08/11 13:46:11 Unable to parse scheduled event end time, continuing: parsing time "" as "2 Jan 2006 15:04:05 GMT": cannot parse "" as "2"
2020/08/11 13:46:11 Sending interruption events to the interruption channel
2020/08/11 13:46:11 Got interruption event from channel {InstanceID:i-006b30be63e27ea2d InstanceType:m5.8xlarge PublicHostname: PublicIP: LocalHostname:ip-10-51-144-143.ec2.internal LocalIP:10.51.144.143} {EventID:instance-event-07c59a40db2a78c8a Kind:SCHEDULED_EVENT Description:instance-stop will occur between 21 Aug 2020 10:00:00 GMT and  because The instance is running on degraded hardware
 State:active StartTime:2020-08-21 10:00:00 +0000 UTC EndTime:2020-08-21 10:00:00 +0000 UTC Drained:false PreDrainTask:0x10cbac0}

Any idea what's the cause?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions