- 
                Notifications
    You must be signed in to change notification settings 
- Fork 41.6k
Enables HTTP/2 health check #95981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enables HTTP/2 health check #95981
Conversation
| I disagree with setting  update: I think the  | 
| @cheftako, this health check is very similar to the GRPC health check we want to add to the apiserver-network-proxy to solve kubernetes-sigs/apiserver-network-proxy#152. Their performance implication is also similar. | 
| @cheftako 
 It would be more convincing if we can test the behavior in a large cluster. @kubernetes/sig-scalability is there an easy way to test this PR in a large cluster? @JensErat do you mind sharing the size of the clusters where you turned on the http/2 health check feature? Thanks. | 
| /test | 
| @BenTheElder: The  
 Use  
 In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. | 
| I think https://github.com/kubernetes/test-infra/blob/b40bcac24d0ec96cc7d440ca5ece690cce7a62fe/config/jobs/kubernetes/sig-scalability/sig-scalability-presubmit-jobs.yaml#L67 is what you want, there are some optional scale presubmit. On second thought though, even if that were the right one, we should check with someone from the scale team before using these resources 🙃 | 
| Thanks, @BenTheElder ! The pull-kubernetes-e2e-gce-large-performance uses 2000 nodes, that seems to be the right test. I'll check with the scalability sig. | 
| Seems relatively harmless from scalability point of view. I'm fine with running 2k presubmit to verify, however our approval is not needed: @tosi3k - for visibility as next oncaller | 
| 
 I think we should backport to 1.18, given: 
 I opened #100376 to pick this to 1.18 | 
| Given the consequences when this fails, I'm in favor of backporting this change if we can get an ack (ideally from @caesarxuchao) that the backport is fairly safe. My knowledge of the golang stack at this depth is fairly limited. Is there a particular minimum golang level that is required or was it just the  | 
| 
 @caesarxuchao acked the backport PR (#100376) 
 It was contained to  | 
Cherrypick #95981 to 1.18, Enables HTTP/2 health check
Following up from cilium#19259, this commit implements the same logic as kubelet [1] to close all idle connections. We can rely on HTTP/2 to determine which connections are idle (HTTP/2 pings), instead of closing all connections indiscriminately (when the heartbeat fails) via the custom dialer in setDialer(). We can assume HTTP/2 support as it has been supported since K8s 1.20 [2]. Users can disable HTTP/2 by setting the DISABLE_HTTP2 environment variable, however this only really meant from very advanced users, hence no documentation. Kubernetes has been running with HTTP/2 as the default for a long time (~since Dec 2020) without major issues. CloseIdleConnectionsFor() was introduced in K8s 1.23 [3], hence this commit cannot be backported to any earlier version of Cilium which doesn't support K8s 1.23. [1]: kubernetes/kubernetes#104844 [2]: kubernetes/kubernetes#95981 [3]: kubernetes/kubernetes@b9d865a Suggested-by: André Martins <[email protected]> Signed-off-by: Chris Tarazi <[email protected]>
Following up from #19259, this commit implements the same logic as kubelet [1] to close all idle connections. We can rely on HTTP/2 to determine which connections are idle (HTTP/2 pings), instead of closing all connections indiscriminately (when the heartbeat fails) via the custom dialer in setDialer(). We can assume HTTP/2 support as it has been supported since K8s 1.20 [2]. Users can disable HTTP/2 by setting the DISABLE_HTTP2 environment variable, however this only really meant from very advanced users, hence no documentation. Kubernetes has been running with HTTP/2 as the default for a long time (~since Dec 2020) without major issues. CloseIdleConnectionsFor() was introduced in K8s 1.23 [3], hence this commit cannot be backported to any earlier version of Cilium which doesn't support K8s 1.23. [1]: kubernetes/kubernetes#104844 [2]: kubernetes/kubernetes#95981 [3]: kubernetes/kubernetes@b9d865a Suggested-by: André Martins <[email protected]> Signed-off-by: Chris Tarazi <[email protected]>
[ upstream commit 365c788 ] Following up from cilium#19259, this commit implements the same logic as kubelet [1] to close all idle connections. We can rely on HTTP/2 to determine which connections are idle (HTTP/2 pings), instead of closing all connections indiscriminately (when the heartbeat fails) via the custom dialer in setDialer(). We can assume HTTP/2 support as it has been supported since K8s 1.20 [2]. Users can disable HTTP/2 by setting the DISABLE_HTTP2 environment variable, however this only really meant from very advanced users, hence no documentation. Kubernetes has been running with HTTP/2 as the default for a long time (~since Dec 2020) without major issues. CloseIdleConnectionsFor() was introduced in K8s 1.23 [3], hence this commit cannot be backported to any earlier version of Cilium which doesn't support K8s 1.23. [1]: kubernetes/kubernetes#104844 [2]: kubernetes/kubernetes#95981 [3]: kubernetes/kubernetes@b9d865a Suggested-by: André Martins <[email protected]> Signed-off-by: Chris Tarazi <[email protected]> Signed-off-by: Paul Chaignon <[email protected]>
[ upstream commit 365c788 ] Following up from #19259, this commit implements the same logic as kubelet [1] to close all idle connections. We can rely on HTTP/2 to determine which connections are idle (HTTP/2 pings), instead of closing all connections indiscriminately (when the heartbeat fails) via the custom dialer in setDialer(). We can assume HTTP/2 support as it has been supported since K8s 1.20 [2]. Users can disable HTTP/2 by setting the DISABLE_HTTP2 environment variable, however this only really meant from very advanced users, hence no documentation. Kubernetes has been running with HTTP/2 as the default for a long time (~since Dec 2020) without major issues. CloseIdleConnectionsFor() was introduced in K8s 1.23 [3], hence this commit cannot be backported to any earlier version of Cilium which doesn't support K8s 1.23. [1]: kubernetes/kubernetes#104844 [2]: kubernetes/kubernetes#95981 [3]: kubernetes/kubernetes@b9d865a Suggested-by: André Martins <[email protected]> Signed-off-by: Chris Tarazi <[email protected]> Signed-off-by: Paul Chaignon <[email protected]>
The env vars were implemented in kubernetes#95981. Effectively no impacts if using the default values here. BUG=255296578 Change-Id: I2441ef1a181a0fb202af7893c25cff5ff15ef17a
The env vars were implemented in kubernetes#95981. Effectively no impacts if using the default values here. BUG=255296578 Change-Id: I2441ef1a181a0fb202af7893c25cff5ff15ef17a Remove GODEBUG=x509sha1=1 workaround Bug: 227456358a Revert changes made in I5165e6c2fe73e8e1b2a617ced591133228b6d275 Change-Id: I7dd1ca8f9796c542c89074703e44e6b6f4b6edab
The env vars were implemented in kubernetes#95981. Effectively no impacts if using the default values here. BUG=255296578 Change-Id: I2441ef1a181a0fb202af7893c25cff5ff15ef17a Remove GODEBUG=x509sha1=1 workaround Bug: 227456358a Revert changes made in I5165e6c2fe73e8e1b2a617ced591133228b6d275 Change-Id: I7dd1ca8f9796c542c89074703e44e6b6f4b6edab
kube-apiserver: Change kube-apiserver liveness probe config setup Prior OSS attempt: kubernetes#94076 Bug: b/164955956 Change-Id: Ia2f75ac5e97f4eb6aa4dc9b00fc2bb3112db9360 kube-apiserver: Set profiling false for kube-apiserver Change-Id: I052a53d72acfc3c5aa8d03f695beb93cadc0be87 kube-apiserver: allow adding memory limit to kube-apiserver via the KUBE_APISERVER_MEMORY_LIMIT environment variable bug: 181991030 Change-Id: I1a01766e6f2415eba31d6d865f7f39d7855f969e kube-apiserver: Drop removed insecure --address and --insecure-port flags GKE config script version of kubernetes#106859 Bug: 209962139 Change-Id: I551cfc12fd94a927383fcff89013a3a78fb61592 kube-apiserver: Readd KUBE_APISERVER_EVENT_TTL_SEC env for kube-apiserver --event-ttl Bug: b/176806841 Change-Id: I5c651a555287af0e891e9592dfb7c172b6bb24a2 kube-apiserver: Re-enable the deprecated behavior of CommonName on X.509 as a diagnoser to detect the fleet level impact on the deprecation Change-Id: If4962b6ecb3294878bfa37ab0ebcd4441fae7140 kube-apiserver: Update permissions of /etc/srv/kubernetes/abac-authz-policy.jsonl if kube-apiserver is running as non-root. BUG=199669294 Change-Id: Ic9ae9add0a35b7df99781e470c917b2427bee289 kube-apiserver: Remove reference to deleted target-ram-mb flag Change-Id: I57d0c80eb2285ec9d93b075b1845f1805e0fd748 kube-apiserver: Re-enable SHA1 signatures with GODEBUG=x509sha1=1 Bug: b/227456358 Bug: b/226424430 Change-Id: I5165e6c2fe73e8e1b2a617ced591133228b6d275 kube-apiserver: use --api-audiences as --service-account-api-audiences is deprecated Copy of kubernetes#103078 Change-Id: I88c2f2eb8bde4378b115e01cbbe9700c27f03955 Expose UDS with profiling data for kube-apiserver. Added in kubernetes#114191. See go/no-gke-tcp-pprof for policy. Refs b/273485199 Change-Id: Ic772c9249468fdf7f516d62bd2a0bfbfe933c1bb Allow specifying terminationGracePeriodSeconds for kube-apiserver. Refs b/252987333 Change-Id: I20699501f429630fe74531cb086091eb9ed3611c Run kube-apiserver with cloud-provider=external Bug: 299159412 Change-Id: I8db8e72377f1c63a3874ee9f0567be264c40ba58 Allow specifying timeout second for HTTP2/2 health check The env vars were implemented in kubernetes#95981. Effectively no impacts if using the default values here. BUG=255296578 Change-Id: I2441ef1a181a0fb202af7893c25cff5ff15ef17a Remove GODEBUG=x509sha1=1 workaround Bug: 227456358a Revert changes made in I5165e6c2fe73e8e1b2a617ced591133228b6d275 Change-Id: I7dd1ca8f9796c542c89074703e44e6b6f4b6edab Use structured authorization configuration Bug: b/333034514 Bug: b/172290849 Bug: b/175273696 Bug: b/332956373 Change-Id: If16e5dd6f9a4da93fb8623e27a9b9fda959eea83 gke/cluster: make kube-apiserver explicitly set enable-logs-handler Preserves existing implicit behavior with 1.31 default changing in kubernetes#125787 Change-Id: I6380b46cef32c7aba4ea7c0ea7e014e18c8f09d0
This PR enables the HTTP/2 connection health check for all the clients that call
SetTransportDefaults. This includes all clients created via k8s.io/client-go. Without the health check, a broken transport layer connection can stay in the HTTP/2 connection pool for a long time, causing requests to the same host to fail. For example, a broken TCP connection can linger for 15 minutes before being closed.The HTTP/2 feature exposes two parameters,
ReadIdleTimeout: if the HTTP/2 client has not received any frame from the connection for this amount of time, it starts sending periodic pings to the server. The period of the pings is alsoReadIdleTimeout.PingTimeout: if an ACK to the ping is not received by the client after this amount of time, the connection is considered broken and will be closed by the client.For Kubernetes, I default the
ReadIdleTimeoutto 30s, which I think is not going to cause performance issues even in large clusters, becauseI'll wait for results from pull-kubernetes-e2e-gce-large-performance to confirm.
I default the
PingTimeoutto 15s, which is relaxed enough in most network conditions.Although I think most users don't need to tune these two parameters, I add environment variables to allow user to tune them, or even disable the health check, by setting
HTTP2_READ_IDLE_TIMEOUT_SECONDS=0.Fixes https://github.com/kubernetes/client-go/issues/374 and
#87615.
Ref. #95898
#94844
/kind bug
/sig api-machinery