Skip to content

Upstream Prematurely Closed Connection While Reading Response Header From Upstream #11244

@ChristianBieri1995

Description

@ChristianBieri1995

What happened:

To route traffic into my Kubernetes cluster, I use an F5 load balancer, nginx ingress controllers (horizontal pod autoscaler) and k8s ingress objects. By doing so, I receive 502 HTTP status codes on a regular basis. Although low in percentage (0.003%), the errors must not be neglected as millions of requests are handled on a daily basis.

After having spent hours to detect the issue, I stumbled upon the following error: 2024/04/10 09:13:00 [error] 3980#3980: *29815576 upstream prematurely closed connection while reading response header from upstream

With that information at hand, I skimmed countless webpages to identify the root cause. After having checked the most obvious culprits such as

  • proxy_read/send/connect_timeout
  • proxy_set_header Connection "";
  • proxy_max_temp_file_size
  • X-Real-IP, X-Forwarded-For
  • proxy_buffer_size

I could not achieve any progress. As a next step, I activated the error-log-level: debug to catch the corresponding logs (see relevant parts below):

2024-03-26T16:21:04.076718973+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 recv: fd:46 0 of 1048576
2024-03-26T16:21:04.076726784+01:00 stderr F 2024/03/26 15:21:04 [error] 136#136: *943568 upstream prematurely closed connection while reading response header from upstream (here, I left out some irrelevant parts)
2024-03-26T16:21:04.076750704+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 http next upstream, 2
2024-03-26T16:21:04.076756887+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 free keepalive peer
2024-03-26T16:21:04.076762948+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 lua balancer free peer, tries: 2
2024-03-26T16:21:04.076768572+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 finalize http upstream request: 502
2024-03-26T16:21:04.076774231+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 finalize http proxy request
2024-03-26T16:21:04.07677987+01:00 stderr F 2024/03/26 15:21:04 [debug] 136#136: *943568 close http upstream connection: 46

Unfortunately, this did not help either but it indicated that there may be an issue with network connections that are reused (HTTP1.1). Therefore, I added nginx.ingress.kubernetes.io/proxy-http-version: "1.0" to the relevant k8s ingress object, and behold: No 502 HTTP status codes anymore. I could not only replicate this behaviour on my test environment, but also on more relevant stages.

In my view, there seems to be an issue with reusing established connections, coming along with HTTP1.1 - probably with my nginx.conf.

NGINX Ingress version: nginx version: nginx/1.21.6

Kubernetes version:
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.5+rke2r1

Environment:

  • Cloud provider or hardware configuration: Rancher, rke2

  • OS : NAME="SLES", VERSION="15-SP5", VERSION_ID="15.5"

  • Kernel: Linux 5.14.21-150500.55.52-default Basic structure  #1 SMP PREEMPT_DYNAMIC Tue Mar 5 16:53:41 UTC 2024 (a62851f) x86_64

  • Install tools:

    • Terraform
  • Basic cluster related info:

    • 20 k8s worker nodes, 8vCores/64GB RAM or 32vCores/256GB RAM
    • 3 k8s controlplane nodes
  • How was the ingress-nginx-controller installed:

    • The nginx ingress controllers are deployed by a deployment which makes use of a horizontal pod autoscaler
    • The installation of the nginx ingress controllers was conducted via helm/helmfile and the following values.yaml was used (not exhaustive):
controller:
  config:
    client-header-buffer-size: "1024k"
    large-client-header-buffers: "4 1024k"
    proxy-buffer-size: "1024k"
    proxy-send-timeout: "90"
    proxy-read-timeout: "200"
    use-proxy-protocol: "true"
    force-ssl-redirect: "true"
    log-format-escaping: json
    log-format-upstream: '{"time": "$time_iso8601", "remote_addr": "$remote_addr", "remote_user": "$remote_user", "request": "$request",
      "response_status": $status, "body_bytes_sent": $body_bytes_sent, "http_referer": "$http_referer", "http_user_agent": "$http_user_agent",
      "request_length": $request_length, "request_time": $request_time, "proxy_upstream_name": "[$proxy_upstream_name]",
      "proxy_alternative_upstream_name": "[$proxy_alternative_upstream_name]", "upstream_addr": "$upstream_addr", "upstream_response_length": $upstream_response_length,
      "upstream_response_time": $upstream_response_time, "upstream_status": $upstream_status, "req_id": "$req_id"}'

  extraArgs:
    enable-ssl-passthrough: ""

  replicaCount: 3
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 4
    targetCPUUtilizationPercentage: 50
    targetMemoryUtilizationPercentage: 90

  resources:
    limits:
      cpu: 800m
      memory: 800Mi
    requests:
      cpu: 100m
      memory: 300Mi

  service:
    type: NodePort
    externalTrafficPolicy: Local
    nodePorts:
      http: 32080
      https: 32443
  • Current State of the controller:
    • ingressclasses.networking.k8s.io:
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.9.6
              helm.sh/chart=ingress-nginx-4.9.1
Annotations:  argocd.argoproj.io/tracking-id: infra-ingress.infra-ingress:networking.k8s.io/IngressClass:infra-ingress-tst/nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • kubectl -n <ingresscontrollernamespace> get all -A -o wide:
NAME                                                    READY   STATUS    RESTARTS      AGE
pod/ingress-nginx-tst-controller-5bdbb97d9-dv8kb        1/1     Running   1 (16d ago)   21d
pod/ingress-nginx-tst-defaultbackend-76d48c597b-265xt   1/1     Running   0             21d

NAME                                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/ingress-nginx-tst-controller           NodePort    x.x.x.x          <none>        80:32080/TCP,443:32443/TCP   176d
service/ingress-nginx-tst-controller-metrics   ClusterIP   x.x.x.x          <none>        10254/TCP                    176d
service/ingress-nginx-tst-defaultbackend       ClusterIP   x.x.x.x          <none>        80/TCP                       176d

NAME                                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ingress-nginx-tst-controller       1/1     1            1           176d
deployment.apps/ingress-nginx-tst-defaultbackend   1/1     1            1           176d

NAME                                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/ingress-nginx-tst-controller-5bdbb97d9        1         1         1       58d
replicaset.apps/ingress-nginx-tst-defaultbackend-76d48c597b   1         1         1       58d

NAME                                                               REFERENCE                                 TARGETS            MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/ingress-nginx-tst-controller   Deployment/ingress-nginx-tst-controller   62%/90%, 10%/50%   1         1         1          176d

--> Please note that the replicas/HPA values were set to 1 for easier identification of the issues described. The issue also arises with multiple replicas, however.


  • kubectl -n <ingresscontrollernamespace> describe pod <ingresscontrollerpodname>:
Name:             ingress-nginx-tst-controller-5bdbb97d9-dv8kb
Namespace:        infra-ingress-tst
Priority:         0
Service Account:  default
Node:             x.x.x.x
Start Time:       Wed, 03 Apr 2024 14:23:15 +0200
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.9.6
                  helm.sh/chart=ingress-nginx-4.9.1
                  pod-template-hash=5bdbb97d9
Annotations:      container.apparmor.security.beta.kubernetes.io/controller: localhost/k8s-ingress-controller
                  kubernetes.io/limit-ranger:
                    LimitRanger plugin set: cpu, memory request for init container opentelemetry; cpu, memory limit for init container opentelemetry
Status:           Running
IP:               x.x.x.x
IPs:
  IP:           x.x.x.x
Controlled By:  ReplicaSet/ingress-nginx-tst-controller-5bdbb97d9
Init Containers:
  opentelemetry:
    Container ID:    containerd://ab2e25842abdd00da17f3404fcea0842961be8301430fe70a7f7e9ff4ee9e2e4
    Image:           x/ingress-nginx/opentelemetry:v20230721-3e2062ee5@sha256:13bee3f5223883d3ca62fee7309ad02d22ec00ff0d7033e3e9aca7a9f60fd472
    Image ID:        x/ingress-nginx/opentelemetry@sha256:13bee3f5223883d3ca62fee7309ad02d22ec00ff0d7033e3e9aca7a9f60fd472
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Command:
      /init_module
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 03 Apr 2024 14:23:15 +0200
      Finished:     Wed, 03 Apr 2024 14:23:15 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  100Mi
    Requests:
      cpu:        100m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /modules_mount from modules (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-54wkm (ro)
Containers:
  controller:
    Container ID:    containerd://1461ce776d455eab59334df36faf03b0c19080e6efe727a4c940271429706bba
    Image:           x/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
    Image ID:        x/ingress-nginx/controller@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c
    Ports:           80/TCP, 443/TCP, 10254/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --default-backend-service=$(POD_NAMESPACE)/ingress-nginx-tst-defaultbackend
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-tst-controller
      --election-id=ingress-nginx-tst-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-tst-controller
      --watch-ingress-without-class=true
      --default-ssl-certificate=infra-ingress-tst/tls-certificates-tst-wildcard
      --enable-ssl-passthrough
    State:          Running
      Started:      Tue, 09 Apr 2024 07:18:25 +0200
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 03 Apr 2024 14:23:16 +0200
      Finished:     Tue, 09 Apr 2024 07:18:25 +0200
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     800m
      memory:  800Mi
    Requests:
      cpu:      100m
      memory:   300Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-tst-controller-5bdbb97d9-dv8kb (v1:metadata.name)
      POD_NAMESPACE:  infra-ingress-tst (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /modules_mount from modules (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-54wkm (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  modules:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-54wkm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>:
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2023-10-31T13:17:04Z"
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.9.6
    helm.sh/chart: ingress-nginx-4.9.1
  name: ingress-nginx-tst-controller
  namespace: infra-ingress-tst
  resourceVersion: "175091575"
  uid: 14205f77-5c71-4e4b-ad3a-f258b01888b6
spec:
  clusterIP: x.x.x.x
  clusterIPs:
  - x.x.x.x
  externalTrafficPolicy: Local
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - appProtocol: http
    name: http
    nodePort: 32080
    port: 80
    protocol: TCP
    targetPort: http
  - appProtocol: https
    name: https
    nodePort: 32443
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress
    app.kubernetes.io/name: ingress-nginx
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

  • Current state of ingress object, if applicable:
    • kubectl -n <appnamespace> describe ing <ingressname>:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
  creationTimestamp: "2024-04-03T09:42:21Z"
  generation: 1
  labels:
    app: bs-simplesample-az-srv
    app.kubernetes.io/instance: bs-simplesample-az-srv
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: bs-simplesample-az-srv
    chart: generic-openliberty-2.3.4
    helm.sh/chart: generic-openliberty-2.3.4
    heritage: Helm
    release: bs-simplesample-az-srv
  name: bs-simplesample-az-srv-tst-business-services
  namespace: kern-sample-bsfw-tst
  resourceVersion: "181093092"
  uid: 0c38ad0b-d4a4-4ecc-9db5-183b63143b4e
spec:
  rules:
  - host: x
    http:
      paths:
      - backend:
          service:
            name: bs-simplesample-az-srv-tst
            port:
              name: http
        path: /sample-bsfw-simplesampleaz-ws/
        pathType: Prefix
  tls:
  - hosts:
    - x
    secretName: tls-certificates-tst-x
status:
  loadBalancer:
    ingress:
    - ip: x.x.x.x
  • Others:
    • nginx.conf
reset_timedout_connection on;
keepalive_timeout  75s;
keepalive_requests 1000;

May there be an issue with timeouts related to established connections?

How to reproduce this issue: Simply by accessing the backend via F5 and nginx ingress controller. To do so, I created a load tester.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/supportCategorizes issue or PR as a support question.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.triage/needs-informationIndicates an issue needs more information in order to work on it.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions