Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipv6 port-forward has a consistent 36s delay on each request #1560

Closed
howardjohn opened this issue May 4, 2020 · 20 comments · Fixed by #1664
Closed

ipv6 port-forward has a consistent 36s delay on each request #1560

howardjohn opened this issue May 4, 2020 · 20 comments · Fixed by #1664
Assignees
Labels
kind/external upstream bugs lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@howardjohn
Copy link
Contributor

When deploying an ipv6 kind cluster, all requests through kubectl port-forward are delayed by exactly 36s. This can be reproduced with a variety of backend pods.

containerd logs during request:

May 04 19:42:36 ipv6-control-plane containerd[185]: time="2020-05-04T19:42:36.748236106Z" level=info msg="Portforward for \"379868fecdbb9611e49f15c3306c2cd25f353d1f32d89f7d9f1350c3c3466769\" port []"
May 04 19:42:36 ipv6-control-plane containerd[185]: time="2020-05-04T19:42:36.748292691Z" level=info msg="Portforward for \"379868fecdbb9611e49f15c3306c2cd25f353d1f32d89f7d9f1350c3c3466769\" returns URL \"http://127.0.0.1:42031/portforward/6cArg5VH\""
May 04 19:42:48 ipv6-control-plane containerd[185]: time="2020-05-04T19:42:48.996645446Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:80\" in network namespace \"/var/run/netns/cni-6db781e9-1cc9-2d16-0b8a-5480d0dcd1d1\""
<36s delay>
May 04 19:43:25 ipv6-control-plane containerd[185]: time="2020-05-04T19:43:25.038951740Z" level=info msg="Finish port forwarding for \"379868fecdbb9611e49f15c3306c2cd25f353d1f32d89f7d9f1350c3c3466769\" port 80"

kubelet seems to have no relevant logs
api-server seems to have no relevant logs

Pod <-> Pod traffic and calling the /proxy endpoint are all fast - just port-forward is slow

$ kind version
kind v0.8.0 go1.14.2 linux/amd64
$ k version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
@BenTheElder
Copy link
Member

@aojea maybe something with the portmap plugin? you somewhat recently PRed changes to this upstream right?

@BenTheElder
Copy link
Member

... seems odd that the port forward logs are referencing ipv4 here

@BenTheElder
Copy link
Member

... also bringing me back to this issue again containerd/cri#730

@aojea
Copy link
Contributor

aojea commented May 4, 2020

/assign
will investigate tomorrow, seems easy to reproduce 🤔

@khenidak
Copy link

khenidak commented May 5, 2020

@BenTheElder you are right socat is executing a tcp4 listen https://github.com/containerd/cri/blob/master/pkg/server/sandbox_portforward_unix.go#L71.

The only reason this is working is because socat can do ipv4 to ipv6 capabilities

Also the delay is really containerd thing i think check:
https://github.com/containerd/cri/blob/master/pkg/server/sandbox_portforward_unix.go#L81

@aojea
Copy link
Contributor

aojea commented May 5, 2020

I can't reproduce this, @howardjohn My observation is that the time between the events

May 05 13:09:19 kind-worker containerd[122]: time="2020-05-05T13:09:19.852531604Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:8001\" in network na
mespace \"/var/run/netns/cni-c9e71e3b-ea1e-7a62-10e4-bacbe9c81650\""

and

May 05 13:09:29 kind-worker containerd[122]: time="2020-05-05T13:09:29.904271707Z" level=info msg="Finish port forwarding for \"76d034f48d0320155b90dbff907bc5261416b20a88914f02f2d360393b3
93a7f\" port 8001"

is the time that the port-forward is being used, you can reproduce it with ipref per example, where you can control the time the data is going through the network pipe.

  1. Run an iperf server in a pod
kubectl run iperf --command --image=us.gcr.io/k8s-artifacts-prod/e2e-test-images/agnhost:2.13 --port=8001 -- /bin/sh -c "/usr/local/bin/iperf -s -p 8001"
  1. Forward the port
kubectl port-forward deployment/iperf 8888:8001
  1. Run traffic through the forwarded port, in this case let's limit it to 5 seconds:
iperf -c localhost -p 8888 -t 5

and we can see the following events in the containerd log , that is running in the node that has the pod with iperf

May 05 13:15:17 kind-worker containerd[122]: time="2020-05-05T13:15:17.604221527Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:8001\" in network namespace \"/var/run/netns/cni-c9e71e3b-ea1e-7a62-10e4-bacbe9c81650\""
May 05 13:15:22 kind-worker containerd[122]: time="2020-05-05T13:15:22.649776547Z" level=info msg="Finish port forwarding for \"76d034f48d0320155b90dbff907bc5261416b20a88914f02f2d360393b393a7f\" port 8001"

if we run it 17 seconds:

iperf -c localhost -p 8888 -t 17

the event gap is 17

May 05 13:19:18 kind-worker containerd[122]: time="2020-05-05T13:19:18.908618796Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:8001\" in network namespace \"/var/run/netns/cni-c9e71e3b-ea1e-7a62-10e4-bacbe9c81650\""
May 05 13:19:35 kind-worker containerd[122]: time="2020-05-05T13:19:35.955743777Z" level=info msg="Finish port forwarding for \"76d034f48d0320155b90dbff907bc5261416b20a88914f02f2d360393b393a7f\" port 8001"

so, 36 seconds should be something that is using the pipe, just curious, what are you using for checking the port forward functionality?

However, there are several things wrong that we should fix in containerd/cri,liek socat and TCP4, thanks for pointing them out here

@howardjohn
Copy link
Contributor Author

I tried 3 different http based applications, 2 Istio debug endpoints and httpbin. All 3 of them are different languages etc but took the same 36s. I'll see if I can repro on another machine and with iperf

@howardjohn
Copy link
Contributor Author

I can reproduce on another machine, but that isn't saying much as both my machines are similar (gLinux).

With iperf setup above:

$ iperf -c localhost -p 8888 -t 8
Handling connection for 8888
------------------------------------------------------------
Client connecting to localhost, TCP port 8888
TCP window size: 2.76 MByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 43664 connected with 127.0.0.1 port 8888
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 8.6 sec  44.0 MBytes  43.1 Mbits/sec

^ exits in 8 seconds

containerd logs show 36s gap

May 05 23:32:10 ipv6-control-plane containerd[185]: time="2020-05-05T23:32:10.111281653Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:8001\" in network namespace \"/var/run/netns/cni-6d4e45bf-b34a-1b5d-de61-ce0d9b69448e\""
May 05 23:32:46 ipv6-control-plane containerd[185]: time="2020-05-05T23:32:46.269239610Z" level=info msg="Finish port forwarding for \"af6f90a18ba77ec9b1b4cc09152fc0c4e6b9134a3bab31f63c18cfd815f9f220\" port 8001

@BenTheElder
Copy link
Member

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label May 6, 2020
@BenTheElder
Copy link
Member

/kind external

@k8s-ci-robot k8s-ci-robot added the kind/external upstream bugs label May 7, 2020
@BenTheElder
Copy link
Member

containerd/cri#1470 just merged, but it will be a bit before this is in containerd I think. Once it's into containerd we can pull new binaries into kind and see how that goes.

@BenTheElder
Copy link
Member

containerd/containerd@65df60b this is now in containerd @ HEAD, will see about pulling it in.

@BenTheElder
Copy link
Member

triggered another nightly build, pulling it in via #1599

@BenTheElder BenTheElder self-assigned this May 15, 2020
@BenTheElder
Copy link
Member

I think this should be fixed if using a node image built with HEAD

This was referenced May 30, 2020
@BenTheElder
Copy link
Member

we had to roll back containerd due to #1634

@BenTheElder
Copy link
Member

blocked on #1637, we're making progress on the issue.

@BenTheElder BenTheElder added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label May 31, 2020
@BenTheElder
Copy link
Member

The first issue in #1637 was fixed, but then we discovered that the containerd upgrade was also the cause of flakiness in our CI. That we think we've tracked down to an upstream timeout change w/ replication in non-kind based CI, but no ETA on getting it fixed.

Once that's done we can look at rolling forward again to pick up the fix for this issue. Until then we're stuck at <= 1.3.4 (we're rather certain the bug is introduced shortly after that, and the flake is due to errors deleting containers ...)

@BenTheElder BenTheElder linked a pull request Jun 10, 2020 that will close this issue
@BenTheElder
Copy link
Member

this should be fixed now.

@Sarek-Wyona
Copy link

Is this fix in the latest v0.11.1 ? or still pending to be release?

@BenTheElder
Copy link
Member

BenTheElder commented Feb 8, 2022

#1664 => 276ff7f => v0.11.1 v0.11.0 v0.10.0 v0.9.0

Edit: there's been 4 releases since the fix was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/external upstream bugs lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants