-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Description
What happened:
To route traffic into my Kubernetes cluster, I use an F5 load balancer, nginx ingress controllers (horizontal pod autoscaler) and k8s ingress objects. By doing so, I receive 502 HTTP status codes on a regular basis. Although low in percentage (0.003%), the errors must not be neglected as millions of requests are handled on a daily basis.
After having spent hours to detect the issue, I stumbled upon the following error: 2024/04/10 09:13:00 [error] 3980#3980: *29815576 upstream prematurely closed connection while reading response header from upstream
With that information at hand, I skimmed countless webpages to identify the root cause. After having checked the most obvious culprits such as
- proxy_read/send/connect_timeout
- proxy_set_header Connection "";
- proxy_max_temp_file_size
- X-Real-IP, X-Forwarded-For
- proxy_buffer_size
I could not achieve any progress. As a next step, I activated the error-log-level: debug to catch the corresponding logs (see relevant parts below):
2024-03-26T16:21:04.076718973+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 recv: fd:46 0 of 1048576
2024-03-26T16:21:04.076726784+01:00 stderr F 2024/03/26 15:21:04 [error] 136#136: *943568 upstream prematurely closed connection while reading response header from upstream (here, I left out some irrelevant parts)
2024-03-26T16:21:04.076750704+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 http next upstream, 2
2024-03-26T16:21:04.076756887+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 free keepalive peer
2024-03-26T16:21:04.076762948+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 lua balancer free peer, tries: 2
2024-03-26T16:21:04.076768572+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 finalize http upstream request: 502
2024-03-26T16:21:04.076774231+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 finalize http proxy request
2024-03-26T16:21:04.07677987+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 close http upstream connection: 46
Unfortunately, this did not help either but it indicated that there may be an issue with network connections that are reused (HTTP1.1). Therefore, I added nginx.ingress.kubernetes.io/proxy-http-version: "1.0" to the relevant k8s ingress object, and behold: No 502 HTTP status codes anymore. I could not only replicate this behaviour on my test environment, but also on more relevant stages.
In my view, there seems to be an issue with reusing established connections, coming along with HTTP1.1 - probably with my nginx.conf.
NGINX Ingress version: nginx version: nginx/1.21.6
Kubernetes version:
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.5+rke2r1
Environment:
-
Cloud provider or hardware configuration: Rancher, rke2
-
OS : NAME="SLES", VERSION="15-SP5", VERSION_ID="15.5"
-
Kernel: Linux 5.14.21-150500.55.52-default Basic structure #1 SMP PREEMPT_DYNAMIC Tue Mar 5 16:53:41 UTC 2024 (a62851f) x86_64
-
Install tools:
- Terraform
-
Basic cluster related info:
- 20 k8s worker nodes, 8vCores/64GB RAM or 32vCores/256GB RAM
- 3 k8s controlplane nodes
-
How was the ingress-nginx-controller installed:
- The nginx ingress controllers are deployed by a deployment which makes use of a horizontal pod autoscaler
- The installation of the nginx ingress controllers was conducted via helm/helmfile and the following values.yaml was used (not exhaustive):
controller:
config:
client-header-buffer-size: "1024k"
large-client-header-buffers: "4 1024k"
proxy-buffer-size: "1024k"
proxy-send-timeout: "90"
proxy-read-timeout: "200"
use-proxy-protocol: "true"
force-ssl-redirect: "true"
log-format-escaping: json
log-format-upstream: '{"time": "$time_iso8601", "remote_addr": "$remote_addr", "remote_user": "$remote_user", "request": "$request",
"response_status": $status, "body_bytes_sent": $body_bytes_sent, "http_referer": "$http_referer", "http_user_agent": "$http_user_agent",
"request_length": $request_length, "request_time": $request_time, "proxy_upstream_name": "[$proxy_upstream_name]",
"proxy_alternative_upstream_name": "[$proxy_alternative_upstream_name]", "upstream_addr": "$upstream_addr", "upstream_response_length": $upstream_response_length,
"upstream_response_time": $upstream_response_time, "upstream_status": $upstream_status, "req_id": "$req_id"}'
extraArgs:
enable-ssl-passthrough: ""
replicaCount: 3
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 4
targetCPUUtilizationPercentage: 50
targetMemoryUtilizationPercentage: 90
resources:
limits:
cpu: 800m
memory: 800Mi
requests:
cpu: 100m
memory: 300Mi
service:
type: NodePort
externalTrafficPolicy: Local
nodePorts:
http: 32080
https: 32443
- Current State of the controller:
- ingressclasses.networking.k8s.io:
Name: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.9.6
helm.sh/chart=ingress-nginx-4.9.1
Annotations: argocd.argoproj.io/tracking-id: infra-ingress.infra-ingress:networking.k8s.io/IngressClass:infra-ingress-tst/nginx
Controller: k8s.io/ingress-nginx
Events: <none>
kubectl -n <ingresscontrollernamespace> get all -A -o wide:
NAME READY STATUS RESTARTS AGE
pod/ingress-nginx-tst-controller-5bdbb97d9-dv8kb 1/1 Running 1 (16d ago) 21d
pod/ingress-nginx-tst-defaultbackend-76d48c597b-265xt 1/1 Running 0 21d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ingress-nginx-tst-controller NodePort x.x.x.x <none> 80:32080/TCP,443:32443/TCP 176d
service/ingress-nginx-tst-controller-metrics ClusterIP x.x.x.x <none> 10254/TCP 176d
service/ingress-nginx-tst-defaultbackend ClusterIP x.x.x.x <none> 80/TCP 176d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ingress-nginx-tst-controller 1/1 1 1 176d
deployment.apps/ingress-nginx-tst-defaultbackend 1/1 1 1 176d
NAME DESIRED CURRENT READY AGE
replicaset.apps/ingress-nginx-tst-controller-5bdbb97d9 1 1 1 58d
replicaset.apps/ingress-nginx-tst-defaultbackend-76d48c597b 1 1 1 58d
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/ingress-nginx-tst-controller Deployment/ingress-nginx-tst-controller 62%/90%, 10%/50% 1 1 1 176d
--> Please note that the replicas/HPA values were set to 1 for easier identification of the issues described. The issue also arises with multiple replicas, however.
kubectl -n <ingresscontrollernamespace> describe pod <ingresscontrollerpodname>:
Name: ingress-nginx-tst-controller-5bdbb97d9-dv8kb
Namespace: infra-ingress-tst
Priority: 0
Service Account: default
Node: x.x.x.x
Start Time: Wed, 03 Apr 2024 14:23:15 +0200
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.9.6
helm.sh/chart=ingress-nginx-4.9.1
pod-template-hash=5bdbb97d9
Annotations: container.apparmor.security.beta.kubernetes.io/controller: localhost/k8s-ingress-controller
kubernetes.io/limit-ranger:
LimitRanger plugin set: cpu, memory request for init container opentelemetry; cpu, memory limit for init container opentelemetry
Status: Running
IP: x.x.x.x
IPs:
IP: x.x.x.x
Controlled By: ReplicaSet/ingress-nginx-tst-controller-5bdbb97d9
Init Containers:
opentelemetry:
Container ID: containerd://ab2e25842abdd00da17f3404fcea0842961be8301430fe70a7f7e9ff4ee9e2e4
Image: x/ingress-nginx/opentelemetry:v20230721-3e2062ee5@sha256:13bee3f5223883d3ca62fee7309ad02d22ec00ff0d7033e3e9aca7a9f60fd472
Image ID: x/ingress-nginx/opentelemetry@sha256:13bee3f5223883d3ca62fee7309ad02d22ec00ff0d7033e3e9aca7a9f60fd472
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
Command:
/init_module
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 03 Apr 2024 14:23:15 +0200
Finished: Wed, 03 Apr 2024 14:23:15 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 250m
memory: 100Mi
Requests:
cpu: 100m
memory: 50Mi
Environment: <none>
Mounts:
/modules_mount from modules (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-54wkm (ro)
Containers:
controller:
Container ID: containerd://1461ce776d455eab59334df36faf03b0c19080e6efe727a4c940271429706bba
Image: x/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
Image ID: x/ingress-nginx/controller@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
Ports: 80/TCP, 443/TCP, 10254/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
/nginx-ingress-controller
--default-backend-service=$(POD_NAMESPACE)/ingress-nginx-tst-defaultbackend
--publish-service=$(POD_NAMESPACE)/ingress-nginx-tst-controller
--election-id=ingress-nginx-tst-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-tst-controller
--watch-ingress-without-class=true
--default-ssl-certificate=infra-ingress-tst/tls-certificates-tst-wildcard
--enable-ssl-passthrough
State: Running
Started: Tue, 09 Apr 2024 07:18:25 +0200
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 03 Apr 2024 14:23:16 +0200
Finished: Tue, 09 Apr 2024 07:18:25 +0200
Ready: True
Restart Count: 1
Limits:
cpu: 800m
memory: 800Mi
Requests:
cpu: 100m
memory: 300Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-nginx-tst-controller-5bdbb97d9-dv8kb (v1:metadata.name)
POD_NAMESPACE: infra-ingress-tst (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/modules_mount from modules (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-54wkm (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
modules:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-54wkm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2023-10-31T13:17:04Z"
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.9.6
helm.sh/chart: ingress-nginx-4.9.1
name: ingress-nginx-tst-controller
namespace: infra-ingress-tst
resourceVersion: "175091575"
uid: 14205f77-5c71-4e4b-ad3a-f258b01888b6
spec:
clusterIP: x.x.x.x
clusterIPs:
- x.x.x.x
externalTrafficPolicy: Local
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- appProtocol: http
name: http
nodePort: 32080
port: 80
protocol: TCP
targetPort: http
- appProtocol: https
name: https
nodePort: 32443
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress
app.kubernetes.io/name: ingress-nginx
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
- Current state of ingress object, if applicable:
kubectl -n <appnamespace> describe ing <ingressname>:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
creationTimestamp: "2024-04-03T09:42:21Z"
generation: 1
labels:
app: bs-simplesample-az-srv
app.kubernetes.io/instance: bs-simplesample-az-srv
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: bs-simplesample-az-srv
chart: generic-openliberty-2.3.4
helm.sh/chart: generic-openliberty-2.3.4
heritage: Helm
release: bs-simplesample-az-srv
name: bs-simplesample-az-srv-tst-business-services
namespace: kern-sample-bsfw-tst
resourceVersion: "181093092"
uid: 0c38ad0b-d4a4-4ecc-9db5-183b63143b4e
spec:
rules:
- host: x
http:
paths:
- backend:
service:
name: bs-simplesample-az-srv-tst
port:
name: http
path: /sample-bsfw-simplesampleaz-ws/
pathType: Prefix
tls:
- hosts:
- x
secretName: tls-certificates-tst-x
status:
loadBalancer:
ingress:
- ip: x.x.x.x
- Others:
- nginx.conf
reset_timedout_connection on;
keepalive_timeout 75s;
keepalive_requests 1000;
May there be an issue with timeouts related to established connections?
How to reproduce this issue: Simply by accessing the backend via F5 and nginx ingress controller. To do so, I created a load tester.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status