-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune haproxy configuration #740
Conversation
@@ -33,18 +33,21 @@ type ConfigData struct { | |||
// DefaultConfigTemplate is the loadbalancer config template | |||
const DefaultConfigTemplate = `# generated by kind | |||
global | |||
stats socket /var/run/haproxy.sock mode 600 level admin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is useful for debugging, installint socat or netcat in the haproxy container, you can check the sessions status per example
/ # echo "show sess" | socat /var/run/haproxy.sock stdio
0x5655011ce8a0: proto=tcpv4 src=172.17.0.1:35988 fe=control-plane be=kube-apiservers srv=ha-control-plane ts=00 age=14m54s calls=3 rate=0 cpu=0 lat=0 rq[f=848202h,i=0,an=00h,rx=45m11s,wx=,ax=] rp[f=80048202h,i=0,an=00h,rx=45m11s,wx=,ax=] s0=[8,200008h,fd=32,ex=] s1=[8,200018h,fd=33,ex=] exp=45m5s
0x5655011cd880: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 age=0s calls=2 rate=2 cpu=0 lat=0 rq[f=c4c220h,i=0,an=00h,rx=,wx=,ax=] rp[f=80008002h,i=0,an=00h,rx=,wx=,ax=] s0=[8,280008h,fd=34,ex=] s1=[8,204018h,fd=-1,ex=] exp=2m
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
cc @ereslibre |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is not addressing the issue but making the wait of the timeout longer (kubectl logs -f
or kubectl exec -it
would die if the inactivity happens during 1 hour instead of 50 seconds).
Looks like tcpka
option itself is not helping.
log /dev/log local0 | ||
log /dev/log local1 notice | ||
daemon | ||
|
||
defaults | ||
log global | ||
mode tcp | ||
option tcpka |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still seeing the same behavior with clitcpka
and srvtcpka
(equivalent to tcpka
), without tweaking timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tcpka
avoids that the intermediate systems closes the connection (in our case iptables since we are doing docker port forwarding).
The kubectl
commands will always die after the inactivity timeout we configure in haproxy if there is no packet between the server and the client.
To avoid this, kubectl should keep the connection open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 seems that TCP keep alive doesn't kick in until the connection has been idle for 2 hours.
by default
🤷♂
cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
/assign @BenTheElder |
option dontlognull | ||
# TODO: tune these | ||
timeout connect 5000 | ||
timeout client 50000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what unit were these in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So 50s to an hour.
Seems arbitrarily large, how do other setups compare (eg kubespray)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is to be safe, maybe exaggerated, but don't think that anybody is going to exhaust the host with open tcp connections with kind :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair I supposed, I'm not sure this is the correct or right fix, but I'm OK to try it, mostly seems harmless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is hard to say if it is a bug or a limitation, the software does what's supposed to do, just doesn`t implement keepalives to avoid the intermediate devices closes the connection, all of the involved components can be blamed, is a common TCP problem with long lived connections 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenTheElder I am going to finish this patch and open a PR. It covers kubectl exec -it
, kubectl logs -f
and kubectl get -w
. The idea is to ping the other end on a regular basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ereslibre! + 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch is almost ready. I will review it tomorrow, checking that I am not breaking anything, rereading for minor changes and I think I should be good to open a PR and ping all stakeholders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, when testing manually I go to the extreme and use this patch in kind: master...ereslibre:low-timeout, it's much more convenient to force the issue to ensure that pings are working as intended.
[I've just caught up on the linked issue] |
when is this happening? what client? what use case? |
haproxy opens 2 sockets and forward traffic between them, if haproxy doesn't see any traffic it expires the session. The OS TCP keepalives are useless because they work at Kernel level (ending in each of the sockets open by haproxy) and haproxy doesn't see them, hence the connection timeout is not refreshed. |
that doesn't really answer the questions:
#740 (comment) sounds like the right solution to me, this is an upstream issue that should be fixed there. we wouldn't want people using this fix on a real cluster, and there's nothing special about kind in this respect. |
any other tools or applications that uses haproxy and not send traffic in 50 sec will have the same problem, however I agree with you it should be solved at the app level |
imho that's the kind of bug you want to find testing your app :^) |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aojea The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
we can reopen in necessary, seems it is no needed |
Current haproxy configuration declared the session closed after 50 sec if no packet was received from the client or from the server.
This caused issues, per example, when trying to tail logs with
For long lived connections the user has to define the appropriate values to his environment, we can safely define 1h in kind, is unlikely that we can DOS the server with the number of open connections.
However, in other environments, and specifically in kind, there are more things that can force a disconnection. Keep in mind that in kind, kubectl connects to the exposed port on localhost, so our TCP connections can be closed by the conntrack timeouts of the host. We can set the
option tcpka
, enabling socket-level TCP keep-alives, that makes the system regularly send packetsto the other end of the connection, leaving it active. The configurations parameters of the TCP keepalives depends on the operations systems, and these keepalives doesn´t prevent the haproxy session to be expired.
Ref: kubernetes/kubeadm#1685