Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Bench can fill up ipvs service proxy in seconds #544

Closed
neeseius opened this issue Sep 28, 2018 · 26 comments
Closed

Apache Bench can fill up ipvs service proxy in seconds #544

neeseius opened this issue Sep 28, 2018 · 26 comments

Comments

@neeseius
Copy link

neeseius commented Sep 28, 2018

I am not sure if I have something configured wrong but here is my Centos7 physical node and kube-router agent setup:

[ipvsadm package]
$ rpm -q ipvsadm
ipvsadm-1.27-7.el7.x86_64

[kube router process and options]
$ ps -ocommand= -C kube-router
/usr/local/bin/kube-router --run-router=true --run-firewall=true --run-service-proxy=true --kubeconfig=/etc/kubernetes/kube-router.kubeconfig --hostname-override=node6 --enable-overlay=true

[service]
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
test-svc NodePort 172.30.176.114 80:30530/TCP 7h

[ipvs]
$ ipvsadm -ln | head -n 1
IP Virtual Server version 1.2.1 (size=4096)

[ipvs service]
$ ipvsadm -ln | grep -A1 30530
TCP 10.200.1.146:30530 rr
-> 172.32.9.68:80 Masq 1 0 0

If I use apache bench with tcp keepalive all is swell and absurdly fast, posting over 10,000 requests per second and ipvsadm will show stats like below during such a test:
$ ipvsadm -ln | grep -A1 30530
TCP 10.200.1.146:30530 rr
-> 172.32.9.68:80 Masq 1 0 757

However if I run the same test without keep-alive then "InActConn" jumps up to 14000 within a few seconds and up until that point things are very fast, but after that point the virtual server just completely hangs up and stops responding to requests until "InActConn" drops back below 14000. This happens if I run apache bench on the node itself and hit the clusterIp, or if I run it from a random server and hit the nodeport.

---ipvs
$ ipvsadm -ln | grep -A1 30530
TCP 10.200.1.146:30530 rr
-> 172.32.9.68:80 Masq 1 0 14115

--- apache bench output
ab -c 100 -n 20000 http://node6:30530
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking node6 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Completed 18000 requests
Completed 20000 requests
Finished 20000 requests

Server Software: Apache/2.4.34
Server Hostname: node6
Server Port: 30530

Document Path: /
Document Length: 2512 bytes

Concurrency Level: 100
Time taken for tests: 63.914 seconds
Complete requests: 20000
Failed requests: 0
Total transferred: 55860000 bytes
HTML transferred: 50240000 bytes
Requests per second: 312.92 [#/sec] (mean)
Time per request: 319.569 [ms] (mean)
Time per request: 3.196 [ms] (mean, across all concurrent requests)
Transfer rate: 853.50 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 311 456.6 10 1005
Processing: 1 7 4.5 8 36
Waiting: 0 7 4.5 8 36
Total: 2 318 453.1 19 1010

Percentage of the requests served within a certain time (ms)
50% 19
66% 21
75% 1003
80% 1004
90% 1004
95% 1005
98% 1005
99% 1006
100% 1010 (longest request)

@uablrek
Copy link
Contributor

uablrek commented Sep 28, 2018

Check with netstat -putan if you have zillions of sockets in TIME_WAIT wheń it stalls.

@neeseius
Copy link
Author

I actually don't see any TIME_WAITS on the physical host, I do see a ton in the containers. I made 6 replicas this time, all still on the same host node6. Below is another picture of where the numbers are at when the requests stop being answered again.

It seems the limit is still 14000, just divided among the containers now.
[ipvsadm]
$ ipvsadm -ln | grep -A6 30530
TCP 10.200.1.146:30530 rr
-> 172.32.9.69:80 Masq 1 0 2336
-> 172.32.9.71:80 Masq 1 0 2336
-> 172.32.9.72:80 Masq 1 0 2336
-> 172.32.9.73:80 Masq 1 0 2336
-> 172.32.9.74:80 Masq 1 0 2336
-> 172.32.9.75:80 Masq 1 0 2336

Below is the number of TIME-WAITS retrieve from each container.
2453
2467
2471
2475
2480
2482

@neeseius
Copy link
Author

Just wanted to point out I am not using DSR, which causes me to wonder why there is an accumulation of TIME_WAIT connections, since in my case shouldn't the LVS be able to see all packets sent both ways?

@uablrek
Copy link
Contributor

uablrek commented Oct 15, 2018

TIME_WAIT is in the tcp standard. The state should linger for 2 minutes (depending on how the connection was shutdown) and ipvs also keeps the state to be able to forward stray packets.
But you can make Linux reuse sockets in time_wait with sysctl's. I don't remember which so you have too search.
But it can be other things. Your symptom, fast connects then a almost dead stop, is some resource that becomes exhausted on the way. It can be ports, but it can be entries in ipvs or (more likely) in "conntrack". I know kube-proxy increases the conntrack tables, perhaps kube-router doesn't, I don't know.
This is a hard problem since you must investigate the whole path.

@neeseius
Copy link
Author

I've done some research on what is going on and it turns out there is a legitimate problem with IPVS.
moby/moby#31746
moby/moby#35082

IPVS is not reusing ports like it is supposed to and thus the ephemeral ports are exhausted depending on the ephemeral port range (net.ipv4.ip_local_port_range). Setting net.ipv4.vs.conntrack=0 in sysctl somehow solves the re-use problem, but it breaks nodeport (and probably other stuff) so I don't believe that is the solution.

I don't know if it's just CentOS 7 that is affected or this is a broader problem but I imagine many other engineering teams using IPVS as a service proxy are going to eventually encounter this limitation.

@xnaveira
Copy link

We have been investigating the problem and have come to the following conclusions:

  • After a connection to a service is finished by the server it ends up in TIME_WAIT state. This state is kept during 2 minutes and then the connection is removed from the conntrack table. If during those 2 minutes the client tries to reuse the same port, upon SYN reception the connection is removed from the conntrack table, the SYN is not forwarded to the backend server nor an ACK is sent back to the client which after one second forces a retransmission from the client. This is perceived from the client as it took 1 second for the server to respond.

  • Disabling conntrack for ipvs (https://www.kernel.org/doc/Documentation/networking/ipvs-sysctl.txt) solves the problem since there is no entries to remove but in our setup created another problem. If the server hit by the client query wasn't running the pod locally, ipvs forwarded the packet to a pod (finding the address in its ipvs rules) but somehow disabling conntrack also disabled the masquerading on that forwarded packet so it reaches the pod with the client address as a source, the pod then tries to answer the query sending the packet directly to the client. Since the client opened a connection to the service ip as opposed to the pod ip, it sends a reset back to the pod and the connection is never established.

  • In our setup both pods ips and service ips are /32 addresses which are accessible from the clients. What we did is run the services with kube-router.io/service.local=true. This announces the service ip only from the hosts which are running one or several pods belonging to that service. This way ipvs never needs to send packets outside the box so no conntrack or masquerade is needed. No conntrack means no 1 second delay when reusing a port too quickly. Since we are using ECMP in our BGP setup the load is equally shared by all the hosts announcing the ip and then again internally by ipvs round robin.

@neeseius
Copy link
Author

neeseius commented Oct 19, 2018

Thank you for looking into this.

We aren't utilizing BGP or ECMP yet so a load balancer will add all nodes regardless.
However, it sounds like we can set net.ipv4.vs.conntrack=0 and as long as we don't use NodePort we should be good?

For example disabling conntrack won't affect a POD hitting a service IP to reach another POD on a different node? And will session affinity like clientip still work?

EDIT:
I see based on the link provided that this will break network policy (iptables).
Hmm.. doesn't seem like I can use LVS as a service proxy

EDIT2:
Based on the link you sent me I found something called conn_reuse_mode

setting:
net.ipv4.vs.conntrack=1 (back to default to iptables will work)
net.ipv4.vs.conn_reuse_mode=0

appears to solve everything, even node port!
I am not sure if this breaks anything else but so far it seems okay to me.

@neeseius
Copy link
Author

Special sauce for me seems to be:

net.ipv4.vs.conntrack=1
net.ipv4.vs.conn_reuse_mode=0
net.ipv4.vs.expire_nodest_conn=1

@xnaveira
Copy link

Our tests showed that disabling reuse with 'net.ipv4.vs.conn_reuse_mode=0' will interfere with scaling. When adding more pods in a high traffic scenario the traffic will stick to the old and overloaded pods and when scaling down, the traffic will be send to non existent pods.

@uablrek
Copy link
Contributor

uablrek commented Oct 20, 2018

Please read this excellent comment on a refered issue; moby/moby#35082 (comment)

Be aware that the case where a stream of connects from a single source may not be the common case in real life. It is more likely that you have few connections but from very many sources.

You may try to tune your system to handle a case that only exist in your lab. While doing so you tweak parameters that are standard and are there for a reason. The result may be that your app becomes more unstable in real life where the networks is less reliable but performs excellent in your lab which is probably a LAN.

@m1093782566
Copy link

@xnaveira

Our tests showed that disabling reuse with 'net.ipv4.vs.conn_reuse_mode=0' will interfere with scaling. When adding more pods in a high traffic scenario the traffic will stick to the old and overloaded pods and when scaling down, the traffic will be send to non existent pods.

Have you ever set net.ipv4.vs.expire_nodest_conn=1?

@linecolumn
Copy link

One of the suggestions was to set --notrack on host:

# iptables -t raw -A PREROUTING -p tcp -d VIP --dport VPORT -j CT --notrack

This makes issues with non local pod communication, AFAIK.

Also, for reference "one second delay communication" article which explains and provides some of solutions, https://marc.info/?l=linux-virtual-server&m=151743061027765&w=2

@m1093782566
Copy link

I have the same confusing feeling that why IPVS drops SYN packet that hits IPVS connection in TIME_WAIT state if such connection uses Netfilter connection tracking (conntrack=1)?

@roffe
Copy link
Collaborator

roffe commented Nov 21, 2018

@neeseius we have set conn_reuse_mode to 0 in the lastest build, could you test if you are experiencing the same problem with cloudnativelabs/kube-router-git@sha256:93c843ce19a7d98e8d07849143cc612359cd97db10aba8dca46e98fa114cca79

@xnaveira
Copy link

@roffe I tried your image in our setup and it seems to solve the problem! When running with latest i do the following test:
curl -s http://$SERVICE --local-port 2348 -w "%{time_total}\n"
This command outputs the total time for the request and forces the local port to be the same across several tries. If run several times in a short interval it gives a time in the order of 10s of milliseconds for the first time but 1s on the following because of the "ipvs dropping syn issue"
Doing the same with that image gives 10s of milliseconds consistently, no more ipvs delays.
We had tried disabling net.ipv4.vs.conn_reuse_mode on the hosts but then the problem was that traffic from the same port was redirected to the same pod even if this pod was removed during 2 minutes after deleting the pod. Have you done something else besides disabling net.ipv4.vs.conn_reuse_mode?

@roffe
Copy link
Collaborator

roffe commented Nov 21, 2018

no, that was the only change

@neeseius
Copy link
Author

This does appear to the solve the problem in my testing, even when scaling up and down.

I know we toyed with these parameters before, but it interfered with scaling.
net.ipv4.vs.conn_reuse_mode=0
net.ipv4.vs.expire_nodest_conn=1

However I noticed this is new:
net.ipv4.vs.expire_quiescent_template = 1

Is that was made the difference?

@xnaveira
Copy link

Could you link to the commit @roffe , I am also curious and in the same situation as @neeseius it seems.

@roffe
Copy link
Collaborator

roffe commented Nov 21, 2018

#577
#579

@roffe
Copy link
Collaborator

roffe commented Nov 21, 2018

This does appear to the solve the problem in my testing, even when scaling up and down.

I know we toyed with these parameters before, but it interfered with scaling.
net.ipv4.vs.conn_reuse_mode=0
net.ipv4.vs.expire_nodest_conn=1

However I noticed this is new:
net.ipv4.vs.expire_quiescent_template = 1

https://github.com/cloudnativelabs/kube-router/blame/master/pkg/controllers/proxy/network_services_controller.go#L285-L295

Is that was made the difference?

@roffe
Copy link
Collaborator

roffe commented Nov 22, 2018

v0.2.3 released with IPVS throughput fixes

@roffe roffe closed this as completed Nov 22, 2018
@igoratencompass
Copy link

Special sauce for me seems to be:

net.ipv4.vs.conntrack=1
net.ipv4.vs.conn_reuse_mode=0
net.ipv4.vs.expire_nodest_conn=1

Don't understand how can the last two be used in the same time when the kernel docs about conn_reuse_mode clearly says:

       0: disable any special handling on port reuse. The new
	connection will be delivered to the same real server that was
	servicing the previous connection. **This will effectively
	disable expire_nodest_conn**

so by setting net.ipv4.vs.conn_reuse_mode=0 you disable net.ipv4.vs.expire_nodest_conn.

@igoratencompass
Copy link

igoratencompass commented Nov 22, 2018

We had tried disabling net.ipv4.vs.conn_reuse_mode on the hosts but then the problem was that traffic from the same port was redirected to the same pod even if this pod was removed during 2 minutes after deleting the pod. Have you done something else besides disabling net.ipv4.vs.conn_reuse_mode?

And this is the main problem I see with this since setting it to zero basically disables net.ipv4.vs.expire_nodest_conn. Or is it just me?

@roffe
Copy link
Collaborator

roffe commented Nov 22, 2018

must be a typo in the docs, kernel does not check if conn_reuse_mode is 0 when expiring nodest conn it seems: https://github.com/torvalds/linux/blob/master/net/netfilter/ipvs/ip_vs_core.c#L1982

krazey pushed a commit to krazey/android_kernel_motorola_exynos9610 that referenced this issue May 6, 2022
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
TenSeventy7 pushed a commit to FreshROMs/android_kernel_samsung_exynos9610_mint that referenced this issue May 8, 2022
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: John Vincent <[email protected]>
Signed-off-by: John Vincent <[email protected]>
Itsyadavishal pushed a commit to Itsyadavishal/sergoops_kernel_realme_sm6150 that referenced this issue Aug 9, 2022
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
ShevT pushed a commit to crdroidandroid/android_kernel_oneplus_sm8150 that referenced this issue Aug 25, 2022
[ Upstream commit f0a5e4d ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amackpro pushed a commit to amackpro/xiaomi_kernel_vayu that referenced this issue Sep 9, 2022
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
johnt1989 pushed a commit to johnt1989/android_kernel_samsung_sm8150 that referenced this issue Feb 13, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
bggRGjQaUbCoE pushed a commit to bggRGjQaUbCoE/android_kernel_samsung_sm8250-mohammad92 that referenced this issue Apr 5, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Itsyadavishal pushed a commit to Itsyadavishal/kernel_realme_sm6150 that referenced this issue Apr 5, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Rem01Gaming pushed a commit to Rem01Gaming/viviz_kernel_even that referenced this issue May 23, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Rem01Gaming pushed a commit to Rem01Gaming/kernel_oplus_even that referenced this issue Jun 3, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
AbzRaider pushed a commit to AbzRaider/kernel_xiaomi_pissarro that referenced this issue Jun 22, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
ahnet-69 pushed a commit to ahnet-69/android_kernel_samsung_a32 that referenced this issue Jul 15, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
nayem8854 pushed a commit to nayem8854/kernel_realme_RMX1931_Arno that referenced this issue Jul 18, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
HoangLong-Lumi pushed a commit to HoangLong-Lumi/android_kernel_samsung_mt6768 that referenced this issue Aug 5, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
HoangLong-Lumi pushed a commit to HoangLong-Lumi/android_kernel_samsung_mt6768 that referenced this issue Aug 5, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
HoangLong-Lumi pushed a commit to HoangLong-Lumi/android_kernel_samsung_mt6768 that referenced this issue Aug 5, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RjTangpos pushed a commit to RjTangpos/kernel_realme_X2-rui2 that referenced this issue Aug 26, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
ratatouille100 pushed a commit to ratatouille100/kernel_samsung_universal9611 that referenced this issue Dec 2, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Shas45558 pushed a commit to Shas45558/shas-dream-oc-mt6768 that referenced this issue Dec 27, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
herokuapp511 pushed a commit to herokuapp511/android_kernel_realme_sm8150 that referenced this issue Dec 31, 2023
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rrsetofamuris pushed a commit to rrsetofamuris/codespaces that referenced this issue Jan 11, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Bakoubak pushed a commit to Bakoubak/old-android_kernel_lenovo_amar that referenced this issue Jan 23, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
theshoqanebi pushed a commit to theshoqanebi/android_samsung_a12_kernel that referenced this issue Apr 1, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
theshoqanebi pushed a commit to theshoqanebi/android_samsung_m12_kernel that referenced this issue Apr 4, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
AndroidHQ254 pushed a commit to A325F/kernel_samsung_a32-old that referenced this issue Apr 7, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
AndroidHQ254 pushed a commit to A325F/kernel_samsung_a32-old that referenced this issue Apr 7, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rsuntk pushed a commit to rsuntk/android_kernel_samsung_a10s-r that referenced this issue Jun 3, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
yazzXx pushed a commit to yazzXx/android_kernel_selene_blueberry that referenced this issue Aug 4, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
cumaRull pushed a commit to cumaRull/kernel_realme_RMX3191 that referenced this issue Aug 10, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
noticesax pushed a commit to noticesax/android_kernel_xiaomi_mt6768 that referenced this issue Nov 7, 2024
[ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
kubernetes/kubernetes#70747

- Apache Bench can fill up ipvs service proxy in seconds #544
cloudnativelabs/kube-router#544

- Additional 1s latency in `host -> service IP -> pod`
kubernetes/kubernetes#90854

Fixes: f719e37 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <[email protected]>
Signed-off-by: YangYuxi <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants