Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations for proxy-send-timeout not honored in upstream connection #10987

Open
jurrehart opened this issue Feb 16, 2024 · 8 comments
Open
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@jurrehart
Copy link

jurrehart commented Feb 16, 2024

What happened:

Having web application that will close incomming connections after 5 seconds idle , without the possibility to change this setting in the app.

We set the following annotations on the ingress

nginx.ingress.kubernetes.io/proxy-send-timeout: "4"

But upstream connections to the pod are still kept open by nginx even after more than 4s have passed between write or read operations on the upstream connection. thus resulting in errors being logged and 502 in access logs

Error log:

recv() failed (104: Connection reset by peer) while reading response header from upstream, 

What you expected to happen:
Having configured the annotations to close connections with no write or read operations for more than the set time I'd expect the upstream connection to be closed when that time is exceeded.

What do you think went wrong?
In the upstream_balancer upstream block there's a keepalive-timeout 60s; thus instructing the upstream module to keep idle connections to the upstream server open for 60s.

While the annotations generate the configurations in the server block for the ingress they are only applied to the proxy_pass http://upstream_balancer; thus having no impact on the upstream connections.

NGINX Ingress controller version :

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.8.1
  Build:         dc88dce9ea5e700f3301d16f971fa17c6cfe757d
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

also noticed same behavior on an EKS setup on which I have no admin access , but seems running v 1.5.1

Kubernetes version (use kubectl version):

Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.2

Environment:

  • Cloud provider or hardware configuration: Baremetal & VMs
  • OS (e.g. from /etc/os-release): Ubuntu
  • Kernel (e.g. uname -a): Linux apiserver 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
    • Manual setup
  • Basic cluster related info:
    • kubectl get nodes -o wide
NAME      STATUS   ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
worker1   Ready    <none>   45d   v1.27.9   192.168.168.100   <none>        Ubuntu 22.04.3 LTS   5.15.0-94-generic   containerd://1.6.26
worker2   Ready    <none>   54d   v1.27.9   192.168.168.101   <none>        Ubuntu 22.04.3 LTS   5.15.0-91-generic   containerd://1.6.26
  • How was the ingress-nginx-controller installed:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.2/deploy/static/provider/cloud/deploy.yaml
  • Current State of the controller:
    • kubectl describe ingressclasses
kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.8.1
Annotations:  <none>
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
Name:             ingress-nginx-controller-79d66f886c-45p4j
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             worker2/192.168.168.101
Start Time:       Thu, 15 Feb 2024 12:04:01 +0100
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.8.1
                  pod-template-hash=79d66f886c
Annotations:      <none>
Status:           Running
IP:               192.168.170.16
IPs:
  IP:           192.168.170.16
Controlled By:  ReplicaSet/ingress-nginx-controller-79d66f886c
Containers:
  controller:
    Container ID:  containerd://676b9c0dfcbe7c03bf75a513b5c70fadb78723f8566b20c20a4dc96b29304def
    Image:         registry.k8s.io/ingress-nginx/controller:v1.8.1@sha256:e5c4824e7375fcf2a393e1c03c293b69759af37a9ca6abdb91b13d78a93da8bd
    Image ID:      registry.k8s.io/ingress-nginx/controller@sha256:e5c4824e7375fcf2a393e1c03c293b69759af37a9ca6abdb91b13d78a93da8bd
    Ports:         80/TCP, 443/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Fri, 16 Feb 2024 10:02:32 +0100
    Ready:          True
    Restart Count:  1
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-79d66f886c-45p4j (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6m6g8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-6m6g8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason          Age   From                      Message
  ----     ------          ----  ----                      -------
  Warning  NodeNotReady    17m   node-controller           Node is not ready
  Normal   SandboxChanged  16m   kubelet                   Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          16m   kubelet                   Container image "registry.k8s.io/ingress-nginx/controller:v1.8.1@sha256:e5c4824e7375fcf2a393e1c03c293b69759af37a9ca6abdb91b13d78a93da8bd" already present on machine
  Normal   Created         16m   kubelet                   Created container controller
  Normal   Started         16m   kubelet                   Started container controller
  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>

  • Current state of ingress object, if applicable:

Name:             whoami-ingress-nginx
Labels:           <none>
Namespace:        default
Address:          
Ingress Class:    nginx
Default backend:  <default>
Rules:
  Host                  Path  Backends
  ----                  ----  --------
  whoami.cluster.local  
                        /   whoami:web (192.168.169.67:80)
Annotations:            nginx.ingress.kubernetes.io/proxy-read-timeout: 4
                        nginx.ingress.kubernetes.io/proxy-send-timeout: 4
                        nginx.ingress.kubernetes.io/server-snippet: keepalive_timeout 4s;
Events:
  Type    Reason  Age   From                      Message
  ----    ------  ----  ----                      -------
  Normal  Sync    12m   nginx-ingress-controller  Scheduled for sync

How to reproduce this issue:

  • Create application single replica with nginx ingress set annotations
nginx.ingress.kubernetes.io/proxy-send-timeout: "4"
  • Run tcpdump on the application pod listening for HTTP traffic comming from ingress controller

  • Make HTTP request tot he applciation with interval >4s

  • Observe nginx open a connection to the pod and reuse that same connection even after more than 4s of no traffic on said connection.

Anything else we need to know:

While the above information is for a locally installed cluster I've obeserved same behaviour on EKS running Kubernetes v1.25.15-eks-e71965b and nginx ingress-controller registry.k8s.io/ingress-nginx/controller-chroot:v1.5.1 where I have no administraive access to the cluster or nginx-controller namespace

@jurrehart jurrehart added the kind/bug Categorizes issue or PR as related to a bug. label Feb 16, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 16, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jurrehart jurrehart changed the title Annotations for proxy-send-timeout/proxy-read-timeout not honored in upstream connection Annotations for proxy-send-timeout not honored in upstream connection Feb 16, 2024
@longwuyuan
Copy link
Contributor

/remove-kind bug

  • This was discussed on slack https://kubernetes.slack.com/archives/CANQGM8BA/p1707991629229589

  • While the details are unclear, a simple test with the httpbun.com/delay api shows some insight into the behaviour of the timeout related annotations. My test shows that the value of 4 seconds and the value of 2 seconds has different responses

image

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 16, 2024
@longwuyuan
Copy link
Contributor

longwuyuan commented Feb 16, 2024

  • The difference in the test I did and the test that the user is reporting is that the user's app has code to drop connections but the httpbun.com image just has an api at "https://httpbun.com/delay/<seconds>"

@longwuyuan
Copy link
Contributor

  • in Contrast to previous test with annotations for timeout, the below test conducted without any timeout annotations configured on the ingress, show a different behaviour (showing that the timeout annotations are working just fine)

image

  • note the difference in the log messages + responses on the api call with delay value as 4 seconds in both tests

@jurrehart
Copy link
Author

jurrehart commented Feb 19, 2024

Beware that the issue is not regarding POD responding slow. So any test with httpbun and it's delay endpoint will not be a verification of the issue. The problem faced is that the POD will close incomming connections once they have been idle for 5 seconds. So the requirement is that the NGINX controller must close it's upstream connections to the POD when they have been idle for 5s or less.

The only way I currenlty found to have NGINX behave that way is to set the global

upstream-keepalive-timeout:  "4"

This will indeed have NGINX close it's upstream connections after 4 seconds of being idle.
But by doing so I'm impacting all ingresses on the NGINX and that's not desired.

Now the documentation for annotations on the ingress regarding he various proxy-xxx-timeout states

Using the configuration configmap it is possible to set the default global timeout for connections to the upstream servers. In some scenarios is required to have different values. To allow this we provide annotations that allows this customization:

Reading the relative NGINX documention for the proxy_send_timeout the paramter set by the proxy-send-timeout annotation.

Sets a timeout for transmitting a request to the proxied server. The timeout is set only between two successive write operations, not for the transmission of the whole request. If the proxied server does not receive anything within this time, the connection is closed.

One would expect to see NGINX close the connection to the upstream once Nginx has not send any data on the upstream connection for the configured amount off time.

However if you obeserve the connetions to the upstream with tcpdump you'll see that NGINX will reuse any open connection that has not been idle for more than 60s (default value) or the value of configured upstream-keepalive-timeout.

@longwuyuan
Copy link
Contributor

I agree with you.

  • My tests are basically additional info for a developer
  • I need to understand the behavior in the tests because with annotation response and without annotation response is not the same. If the annotation was not working then the response should be the same regardless of annotation being present or absent
  • Your app is designed for a 4-5 min drop and the timing you are setting in the annotation is also 4. So it is not easy for a reader here to reproduce the problem and state beyond doubt that the annotation is not working
  • I am going to test with a simple app using the image nginx:alpine and see if the response is different for with-annotation and without-annotation configs

@longwuyuan
Copy link
Contributor

I am unable to figure out how to create a delay with the vanilla httpd:alpine image (or nginx:alpine image for that matter)

Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
Development

No branches or pull requests

3 participants