Tune haproxy configuration #740

aojea · 2019-07-27T21:38:21Z

Current haproxy configuration declared the session closed after 50 sec if no packet was received from the client or from the server.
This caused issues, per example, when trying to tail logs with

...
127.0.0.1 - - [26/Jul/2019:21:20:28 +0000] "GET /favicon.ico HTTP/1.1" 404 555 "http://localhost:8000/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" "-"
F0726 22:21:48.325924    1750 helpers.go:114] error: unexpected EOF

For long lived connections the user has to define the appropriate values to his environment, we can safely define 1h in kind, is unlikely that we can DOS the server with the number of open connections.

  timeout client 1h
  timeout server 1h

However, in other environments, and specifically in kind, there are more things that can force a disconnection. Keep in mind that in kind, kubectl connects to the exposed port on localhost, so our TCP connections can be closed by the conntrack timeouts of the host. We can set the option tcpka, enabling socket-level TCP keep-alives, that makes the system regularly send packets
to the other end of the connection, leaving it active. The configurations parameters of the TCP keepalives depends on the operations systems, and these keepalives doesn´t prevent the haproxy session to be expired.

Ref: kubernetes/kubeadm#1685

aojea · 2019-07-27T21:41:18Z

pkg/internal/cluster/loadbalancer/config.go

@@ -33,18 +33,21 @@ type ConfigData struct {
 // DefaultConfigTemplate is the loadbalancer config template
 const DefaultConfigTemplate = `# generated by kind
 global
+  stats socket /var/run/haproxy.sock mode 600 level admin


this is useful for debugging, installint socat or netcat in the haproxy container, you can check the sessions status per example

/ # echo "show sess" | socat /var/run/haproxy.sock stdio 0x5655011ce8a0: proto=tcpv4 src=172.17.0.1:35988 fe=control-plane be=kube-apiservers srv=ha-control-plane ts=00 age=14m54s calls=3 rate=0 cpu=0 lat=0 rq[f=848202h,i=0,an=00h,rx=45m11s,wx=,ax=] rp[f=80048202h,i=0,an=00h,rx=45m11s,wx=,ax=] s0=[8,200008h,fd=32,ex=] s1=[8,200018h,fd=33,ex=] exp=45m5s 0x5655011cd880: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 age=0s calls=2 rate=2 cpu=0 lat=0 rq[f=c4c220h,i=0,an=00h,rx=,wx=,ax=] rp[f=80008002h,i=0,an=00h,rx=,wx=,ax=] s0=[8,280008h,fd=34,ex=] s1=[8,204018h,fd=-1,ex=] exp=2m ```

aojea · 2019-07-27T21:41:45Z

cc @ereslibre

ereslibre

I think this change is not addressing the issue but making the wait of the timeout longer (kubectl logs -f or kubectl exec -it would die if the inactivity happens during 1 hour instead of 50 seconds).

Looks like tcpka option itself is not helping.

ereslibre · 2019-07-27T21:49:37Z

pkg/internal/cluster/loadbalancer/config.go

  log /dev/log local0
  log /dev/log local1 notice
  daemon

 defaults
  log global
  mode tcp
+  option tcpka


I'm still seeing the same behavior with clitcpka and srvtcpka (equivalent to tcpka), without tweaking timeouts.

the tcpka avoids that the intermediate systems closes the connection (in our case iptables since we are doing docker port forwarding).
The kubectl commands will always die after the inactivity timeout we configure in haproxy if there is no packet between the server and the client.
To avoid this, kubectl should keep the connection open

🤔 seems that TCP keep alive doesn't kick in until the connection has been idle for 2 hours. by default

🤷‍♂

cat /proc/sys/net/ipv4/tcp_keepalive_time 7200

aojea · 2019-07-27T22:20:49Z

/assign @BenTheElder

BenTheElder · 2019-07-30T23:22:22Z

pkg/internal/cluster/loadbalancer/config.go

  option dontlognull
  # TODO: tune these
  timeout connect 5000
-  timeout client 50000


what unit were these in?

milliseconds https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4-timeout%20server

So 50s to an hour.

Seems arbitrarily large, how do other setups compare (eg kubespray)?

is to be safe, maybe exaggerated, but don't think that anybody is going to exhaust the host with open tcp connections with kind :)

fair I supposed, I'm not sure this is the correct or right fix, but I'm OK to try it, mostly seems harmless

is hard to say if it is a bug or a limitation, the software does what's supposed to do, just doesn`t implement keepalives to avoid the intermediate devices closes the connection, all of the involved components can be blamed, is a common TCP problem with long lived connections 🤷‍♂️

@BenTheElder I am going to finish this patch and open a PR. It covers kubectl exec -it, kubectl logs -f and kubectl get -w. The idea is to ping the other end on a regular basis.

thanks @ereslibre! + 💯

The patch is almost ready. I will review it tomorrow, checking that I am not breaking anything, rereading for minor changes and I think I should be good to open a PR and ping all stakeholders.

Side note, when testing manually I go to the extreme and use this patch in kind: master...ereslibre:low-timeout, it's much more convenient to force the issue to ensure that pings are working as intended.

BenTheElder · 2019-08-01T08:50:33Z

[I've just caught up on the linked issue]

BenTheElder · 2019-08-05T23:08:53Z

Current haproxy configuration declared the session closed after 50 sec if no packet was received from the client or from the server.

when is this happening? what client? what use case?

aojea · 2019-08-06T07:33:36Z

when is this happening? what client? what use case?
Using those commands
#740 (review)

haproxy opens 2 sockets and forward traffic between them, if haproxy doesn't see any traffic it expires the session. The OS TCP keepalives are useless because they work at Kernel level (ending in each of the sockets open by haproxy) and haproxy doesn't see them, hence the connection timeout is not refreshed.

BenTheElder · 2019-08-06T21:20:00Z

haproxy opens 2 sockets and forward traffic between them, if haproxy doesn't see any traffic it expires the session. The OS TCP keepalives are useless because they work at Kernel level (ending in each of the sockets open by haproxy) and haproxy doesn't see them, hence the connection timeout is not refreshed.

that doesn't really answer the questions:

which client?
- answer: kubectl, any kubernetes client ...
what use case?
- answer: kubectl logs, apparently when there's no output there's no packets

#740 (comment) sounds like the right solution to me, this is an upstream issue that should be fixed there.

we wouldn't want people using this fix on a real cluster, and there's nothing special about kind in this respect.

aojea · 2019-08-06T21:34:55Z

any other tools or applications that uses haproxy and not send traffic in 50 sec will have the same problem, however I agree with you it should be solved at the app level

BenTheElder · 2019-08-06T21:42:14Z

imho that's the kind of bug you want to find testing your app :^)

k8s-ci-robot · 2019-08-06T21:43:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
To complete the pull request process, please assign bentheelder
You can assign the PR to them by writing /assign @bentheelder in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aojea · 2019-09-06T21:42:55Z

we can reopen in necessary, seems it is no needed

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 27, 2019

k8s-ci-robot requested review from amwat and munnerz July 27, 2019 21:38

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 27, 2019

aojea commented Jul 27, 2019

View reviewed changes

ereslibre reviewed Jul 27, 2019

View reviewed changes

k8s-ci-robot assigned BenTheElder Jul 27, 2019

BenTheElder reviewed Jul 30, 2019

View reviewed changes

Tune haproxy configuration

507ccc3

aojea force-pushed the haproxy_tuning branch from a624ec6 to 507ccc3 Compare August 6, 2019 21:43

BenTheElder added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 14, 2019

aojea closed this Sep 6, 2019

aojea deleted the haproxy_tuning branch August 9, 2020 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune haproxy configuration #740

Tune haproxy configuration #740

aojea commented Jul 27, 2019 •

edited

Loading

aojea Jul 27, 2019

BenTheElder Jul 30, 2019

aojea commented Jul 27, 2019

ereslibre left a comment

ereslibre Jul 27, 2019

aojea Jul 27, 2019

aojea Jul 27, 2019 •

edited

Loading

aojea commented Jul 27, 2019

BenTheElder Jul 30, 2019 •

edited

Loading

aojea Jul 31, 2019

BenTheElder Aug 1, 2019

aojea Aug 1, 2019

BenTheElder Aug 5, 2019

aojea Aug 6, 2019

ereslibre Aug 6, 2019

BenTheElder Aug 6, 2019

ereslibre Aug 6, 2019

ereslibre Aug 6, 2019

BenTheElder commented Aug 1, 2019

BenTheElder commented Aug 5, 2019

aojea commented Aug 6, 2019

BenTheElder commented Aug 6, 2019

aojea commented Aug 6, 2019

BenTheElder commented Aug 6, 2019

k8s-ci-robot commented Aug 6, 2019

aojea commented Sep 6, 2019 •

edited

Loading

Tune haproxy configuration #740

Tune haproxy configuration #740

Conversation

aojea commented Jul 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea commented Jul 27, 2019

ereslibre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea Jul 27, 2019 • edited Loading

Choose a reason for hiding this comment

aojea commented Jul 27, 2019

BenTheElder Jul 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenTheElder commented Aug 1, 2019

BenTheElder commented Aug 5, 2019

aojea commented Aug 6, 2019

BenTheElder commented Aug 6, 2019

aojea commented Aug 6, 2019

BenTheElder commented Aug 6, 2019

k8s-ci-robot commented Aug 6, 2019

aojea commented Sep 6, 2019 • edited Loading

aojea commented Jul 27, 2019 •

edited

Loading

aojea Jul 27, 2019 •

edited

Loading

BenTheElder Jul 30, 2019 •

edited

Loading

aojea commented Sep 6, 2019 •

edited

Loading