ipv6 port-forward has a consistent 36s delay on each request #1560

howardjohn · 2020-05-04T20:08:21Z

When deploying an ipv6 kind cluster, all requests through kubectl port-forward are delayed by exactly 36s. This can be reproduced with a variety of backend pods.

containerd logs during request:

May 04 19:42:36 ipv6-control-plane containerd[185]: time="2020-05-04T19:42:36.748236106Z" level=info msg="Portforward for \"379868fecdbb9611e49f15c3306c2cd25f353d1f32d89f7d9f1350c3c3466769\" port []"
May 04 19:42:36 ipv6-control-plane containerd[185]: time="2020-05-04T19:42:36.748292691Z" level=info msg="Portforward for \"379868fecdbb9611e49f15c3306c2cd25f353d1f32d89f7d9f1350c3c3466769\" returns URL \"http://127.0.0.1:42031/portforward/6cArg5VH\""
May 04 19:42:48 ipv6-control-plane containerd[185]: time="2020-05-04T19:42:48.996645446Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:80\" in network namespace \"/var/run/netns/cni-6db781e9-1cc9-2d16-0b8a-5480d0dcd1d1\""
<36s delay>
May 04 19:43:25 ipv6-control-plane containerd[185]: time="2020-05-04T19:43:25.038951740Z" level=info msg="Finish port forwarding for \"379868fecdbb9611e49f15c3306c2cd25f353d1f32d89f7d9f1350c3c3466769\" port 80"

kubelet seems to have no relevant logs
api-server seems to have no relevant logs

Pod <-> Pod traffic and calling the /proxy endpoint are all fast - just port-forward is slow

$ kind version
kind v0.8.0 go1.14.2 linux/amd64

$ k version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

BenTheElder · 2020-05-04T21:56:58Z

@aojea maybe something with the portmap plugin? you somewhat recently PRed changes to this upstream right?

BenTheElder · 2020-05-04T22:03:39Z

... seems odd that the port forward logs are referencing ipv4 here

BenTheElder · 2020-05-04T22:12:07Z

... also bringing me back to this issue again containerd/cri#730

aojea · 2020-05-04T22:38:47Z

/assign
will investigate tomorrow, seems easy to reproduce 🤔

khenidak · 2020-05-05T00:21:45Z

@BenTheElder you are right socat is executing a tcp4 listen https://github.com/containerd/cri/blob/master/pkg/server/sandbox_portforward_unix.go#L71.

The only reason this is working is because socat can do ipv4 to ipv6 capabilities

Also the delay is really containerd thing i think check:
https://github.com/containerd/cri/blob/master/pkg/server/sandbox_portforward_unix.go#L81

aojea · 2020-05-05T13:22:01Z

I can't reproduce this, @howardjohn My observation is that the time between the events

May 05 13:09:19 kind-worker containerd[122]: time="2020-05-05T13:09:19.852531604Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:8001\" in network na
mespace \"/var/run/netns/cni-c9e71e3b-ea1e-7a62-10e4-bacbe9c81650\""

and

May 05 13:09:29 kind-worker containerd[122]: time="2020-05-05T13:09:29.904271707Z" level=info msg="Finish port forwarding for \"76d034f48d0320155b90dbff907bc5261416b20a88914f02f2d360393b3
93a7f\" port 8001"

is the time that the port-forward is being used, you can reproduce it with ipref per example, where you can control the time the data is going through the network pipe.

Run an iperf server in a pod

kubectl run iperf --command --image=us.gcr.io/k8s-artifacts-prod/e2e-test-images/agnhost:2.13 --port=8001 -- /bin/sh -c "/usr/local/bin/iperf -s -p 8001"

Forward the port

kubectl port-forward deployment/iperf 8888:8001

Run traffic through the forwarded port, in this case let's limit it to 5 seconds:

iperf -c localhost -p 8888 -t 5

and we can see the following events in the containerd log , that is running in the node that has the pod with iperf

May 05 13:15:17 kind-worker containerd[122]: time="2020-05-05T13:15:17.604221527Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:8001\" in network namespace \"/var/run/netns/cni-c9e71e3b-ea1e-7a62-10e4-bacbe9c81650\""
May 05 13:15:22 kind-worker containerd[122]: time="2020-05-05T13:15:22.649776547Z" level=info msg="Finish port forwarding for \"76d034f48d0320155b90dbff907bc5261416b20a88914f02f2d360393b393a7f\" port 8001"

if we run it 17 seconds:

iperf -c localhost -p 8888 -t 17

the event gap is 17

May 05 13:19:18 kind-worker containerd[122]: time="2020-05-05T13:19:18.908618796Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:8001\" in network namespace \"/var/run/netns/cni-c9e71e3b-ea1e-7a62-10e4-bacbe9c81650\""
May 05 13:19:35 kind-worker containerd[122]: time="2020-05-05T13:19:35.955743777Z" level=info msg="Finish port forwarding for \"76d034f48d0320155b90dbff907bc5261416b20a88914f02f2d360393b393a7f\" port 8001"

so, 36 seconds should be something that is using the pipe, just curious, what are you using for checking the port forward functionality?

However, there are several things wrong that we should fix in containerd/cri,liek socat and TCP4, thanks for pointing them out here

howardjohn · 2020-05-05T13:48:06Z

I tried 3 different http based applications, 2 Istio debug endpoints and httpbin. All 3 of them are different languages etc but took the same 36s. I'll see if I can repro on another machine and with iperf

howardjohn · 2020-05-05T23:33:41Z

I can reproduce on another machine, but that isn't saying much as both my machines are similar (gLinux).

With iperf setup above:

$ iperf -c localhost -p 8888 -t 8
Handling connection for 8888
------------------------------------------------------------
Client connecting to localhost, TCP port 8888
TCP window size: 2.76 MByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 43664 connected with 127.0.0.1 port 8888
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 8.6 sec  44.0 MBytes  43.1 Mbits/sec

^ exits in 8 seconds

containerd logs show 36s gap

May 05 23:32:10 ipv6-control-plane containerd[185]: time="2020-05-05T23:32:10.111281653Z" level=info msg="Executing port forwarding command \"/usr/bin/socat - TCP4:localhost:8001\" in network namespace \"/var/run/netns/cni-6d4e45bf-b34a-1b5d-de61-ce0d9b69448e\""
May 05 23:32:46 ipv6-control-plane containerd[185]: time="2020-05-05T23:32:46.269239610Z" level=info msg="Finish port forwarding for \"af6f90a18ba77ec9b1b4cc09152fc0c4e6b9134a3bab31f63c18cfd815f9f220\" port 8001

BenTheElder · 2020-05-06T20:07:22Z

/lifecycle active

BenTheElder · 2020-05-07T05:54:39Z

/kind external

BenTheElder · 2020-05-09T06:16:37Z

containerd/cri#1470 just merged, but it will be a bit before this is in containerd I think. Once it's into containerd we can pull new binaries into kind and see how that goes.

BenTheElder · 2020-05-15T05:27:48Z

containerd/containerd@65df60b this is now in containerd @ HEAD, will see about pulling it in.

BenTheElder · 2020-05-15T08:00:50Z

triggered another nightly build, pulling it in via #1599

BenTheElder · 2020-05-28T18:23:14Z

I think this should be fixed if using a node image built with HEAD

BenTheElder · 2020-05-30T18:15:57Z

we had to roll back containerd due to #1634

BenTheElder · 2020-05-31T18:32:16Z

blocked on #1637, we're making progress on the issue.

BenTheElder · 2020-06-05T01:19:18Z

The first issue in #1637 was fixed, but then we discovered that the containerd upgrade was also the cause of flakiness in our CI. That we think we've tracked down to an upstream timeout change w/ replication in non-kind based CI, but no ETA on getting it fixed.

Once that's done we can look at rolling forward again to pick up the fix for this issue. Until then we're stuck at <= 1.3.4 (we're rather certain the bug is introduced shortly after that, and the flake is due to errors deleting containers ...)

BenTheElder · 2020-06-10T21:38:23Z

this should be fixed now.

Sarek-Wyona · 2022-02-08T19:35:12Z

Is this fix in the latest v0.11.1 ? or still pending to be release?

BenTheElder · 2022-02-08T19:36:35Z

#1664 => 276ff7f => v0.11.1 v0.11.0 v0.10.0 v0.9.0

Edit: there's been 4 releases since the fix was merged.

k8s-ci-robot assigned aojea May 4, 2020

aojea mentioned this issue May 6, 2020

Get rid of socat for port forwarding containerd/cri#1470

Merged

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label May 6, 2020

k8s-ci-robot added the kind/external upstream bugs label May 7, 2020

BenTheElder mentioned this issue May 15, 2020

Upgrade containerd to 1.4-beta.0 #1599

Merged

BenTheElder self-assigned this May 15, 2020

This was referenced May 30, 2020

Cannot run kind from HEAD #1634

Closed

Rollback containerd #1635

Merged

BenTheElder added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label May 31, 2020

BenTheElder mentioned this issue Jun 10, 2020

base image updates #1664

Merged

BenTheElder linked a pull request Jun 10, 2020 that will close this issue

base image updates #1664

Merged

k8s-ci-robot closed this as completed in #1664 Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ipv6 port-forward has a consistent 36s delay on each request #1560

ipv6 port-forward has a consistent 36s delay on each request #1560

howardjohn commented May 4, 2020

BenTheElder commented May 4, 2020

BenTheElder commented May 4, 2020

BenTheElder commented May 4, 2020

aojea commented May 4, 2020

khenidak commented May 5, 2020

aojea commented May 5, 2020

howardjohn commented May 5, 2020

howardjohn commented May 5, 2020

BenTheElder commented May 6, 2020

BenTheElder commented May 7, 2020

BenTheElder commented May 9, 2020

BenTheElder commented May 15, 2020

BenTheElder commented May 15, 2020

BenTheElder commented May 28, 2020

BenTheElder commented May 30, 2020

BenTheElder commented May 31, 2020

BenTheElder commented Jun 5, 2020

BenTheElder commented Jun 10, 2020

Sarek-Wyona commented Feb 8, 2022

BenTheElder commented Feb 8, 2022 •

edited

Loading

ipv6 port-forward has a consistent 36s delay on each request #1560

ipv6 port-forward has a consistent 36s delay on each request #1560

Comments

howardjohn commented May 4, 2020

BenTheElder commented May 4, 2020

BenTheElder commented May 4, 2020

BenTheElder commented May 4, 2020

aojea commented May 4, 2020

khenidak commented May 5, 2020

aojea commented May 5, 2020

howardjohn commented May 5, 2020

howardjohn commented May 5, 2020

BenTheElder commented May 6, 2020

BenTheElder commented May 7, 2020

BenTheElder commented May 9, 2020

BenTheElder commented May 15, 2020

BenTheElder commented May 15, 2020

BenTheElder commented May 28, 2020

BenTheElder commented May 30, 2020

BenTheElder commented May 31, 2020

BenTheElder commented Jun 5, 2020

BenTheElder commented Jun 10, 2020

Sarek-Wyona commented Feb 8, 2022

BenTheElder commented Feb 8, 2022 • edited Loading

BenTheElder commented Feb 8, 2022 •

edited

Loading