Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pasta: timeout again (rawhide) #20170

Closed
edsantiago opened this issue Sep 27, 2023 · 36 comments
Closed

pasta: timeout again (rawhide) #20170

edsantiago opened this issue Sep 27, 2023 · 36 comments
Assignees
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. pasta pasta(1) bugs or features

Comments

@edsantiago
Copy link
Member

Resurrecting #17598. Pasta test timing out again. Seen in rawhide with passt-0^20230908.g05627dc-1.fc40-x86_64:

[+1596s] ok 556 podman networking with pasta(1) - TCP/IPv4 small transfer, loopback
[+1726s] not ok 557 podman networking with pasta(1) - TCP/IPv4 large transfer, tap
         # (from function `die' in file test/system/helpers.bash, line 747,
         #  from function `run_podman' in file test/system/helpers.bash, line 384,
         #  from function `pasta_test_do' in file test/system/505-networking-pasta.bats, line 248,
         #  in test file test/system/505-networking-pasta.bats, line 700)
         #   `pasta_test_do' failed
         #
         # [20:56:52.117914919] $ /var/tmp/go/src/github.com/containers/podman/bin/podman rm -t 0 --all --force --ignore
         # [20:56:52.226806205] 42b6b5a167b84848f4b09d240ee31f69736ad905a0761bf4b53a4539b2879ceb
         #
         # [20:56:52.238402603] $ /var/tmp/go/src/github.com/containers/podman/bin/podman ps --all --external --format {{.ID}} {{.Names}}
         #
         # [20:56:52.314645311] $ /var/tmp/go/src/github.com/containers/podman/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
         # [20:56:52.386168270] quay.io/libpod/testimage:20221018 f5a99120db64
         # 1+0 records in
         # 1+0 records out
         # 10485760 bytes (10 MB, 10 MiB) copied, 0.0408792 s, 257 MB/s
         #
         # [20:56:52.693255830] $ /var/tmp/go/src/github.com/containers/podman/bin/podman run --net=pasta -p [10.128.0.120]:5294:5294/tcp quay.io/libpod/testimage:20221018 sh -c for port in $(seq 5294 5294); do                              socat -u TCP4-LISTEN:${port},bind=[10.128.0.120] EXEC:md5sum &                          done; wait
         # [20:59:02.709781235] timeout: sending signal TERM to command ?/var/tmp/go/src/github.com/containers/podman/bin/podman?
         # timeout: sending signal KILL to command ?/var/tmp/go/src/github.com/containers/podman/bin/podman?
         # [20:59:02.716663367] [ rc=137 (** EXPECTED 0 **) ]

This is f40 rawhide, not f39; and it is an in-progress PR. Podman is not yet testing f40 rawhide, so we are unlikely to see this flake often... until that PR merges.

@edsantiago edsantiago added flakes Flakes from Continuous Integration pasta pasta(1) bugs or features labels Sep 27, 2023
@edsantiago
Copy link
Member Author

Another one from a test PR. Four runs total, two failures.

@sbrivio-rh
Copy link
Collaborator

cc @dgibson

@edsantiago
Copy link
Member Author

And again, same story (rawhide).

edsantiago added a commit to edsantiago/libpod that referenced this issue Sep 28, 2023
...from f38 + f37.

Requires one minor e2e test change, to handle an error logging
change in conmon 2.1.8.

Also, this is important, requires crun-1.9.1 because of a kernel
symlink change; see containers/crun#1309
The VM images here were carefully built to include that. By the
time the next VM images get built, it should be default.

Since we've bumped crun, remove two obsolete skips

And, skip a flaky pasta test, containers#20170

Signed-off-by: Ed Santiago <[email protected]>
@dgibson
Copy link
Collaborator

dgibson commented Oct 11, 2023

Hrm, not much to go on here. It's possible this is a new version of #17598, but it's equally possible it's some entirely unrelated timeout. Since we've had what I believe should be a pretty robust fix for the #17598 aliasing problem, I'm inclined to suspect it's something else.

@edsantiago do you have a recipe for creating a VM with the f40 snapshot where the problem is appearing?

@edsantiago
Copy link
Member Author

Hangs on my very first try just now, rawhide, passt-0^20231004.gf851084-1.fc40.x86_64

$ 1minutetip -n 1MT-Fedora-Rawhide
....
ssh into it...
git clone podman, install dependencies
...
# loginctl enable-linger fedora
# su - fedora
$ cd /path/to/podman
$ vim test/system/505*   <----- search for 20170, remove the skip
$ hack/bats --rootless  505:"TCP/IPv4 large transfer, tap"
chcon: failed to change context of '/root/go/podman/bin/podman' to ‘unconfined_u:object_r:container_runtime_exec_t:s0’: Operation not permitted
--------------------------------------------------
$ bats --filter TCP/IPv4 large transfer, tap test/system/505-networking-pasta.bats
505-networking-pasta.bats
   podman networking with pasta(1) - TCP/IPv4 large transfer, tap     

After ^C:

   [13:25:17.815281711] $ /root/go/podman/bin/podman run --net=pasta -p [10.0.184.105]:5788:5788/tcp quay.io/libpod/testimage:20221018 sh -c for port in $(seq 5788 5788); do                              socat -u TCP4-LISTEN:${port},bind=[10.0.184.105] EXEC:md5sum &                          done; wait
   [13:25:17.959441496] Error: pasta failed with exit code 1:
   No external routable interface for IPv6
   Failed to bind any port for '-t 10.0.184.105/5788-5788:5788-5788', exiting
   [13:25:17.962768061] [ rc=126 (** EXPECTED 0 **) ]

@edsantiago
Copy link
Member Author

One-line reproducer:

$ bin/podman run --net=pasta -p '[127.0.0.1]':5799:5799/tcp quay.io/libpod/testimage:20221018 date
Error: pasta failed with exit code 1:
No external routable interface for IPv6
Failed to bind any port for '-t 127.0.0.1/5799-5799:5799-5799', exiting

Fails the same way in f38 with passt-selinux-0^20230908.g05627dc-1.fc38.noarch.

In f38, after dnf downgrade passt to 0^20230329.gb10b983-1.fc38, it fails differently:

Couldn't open network namespace /run/user/1000/netns/netns-aeb16e93-56d4-0a60-ad67-c4b17976d553

@dgibson
Copy link
Collaborator

dgibson commented Oct 12, 2023

This message looks like a good clue:

Failed to bind any port for '-t 10.0.184.105/5788-5788:5788-5788', exiting

Usually this means that pasta was unable to bind the specified ports on the host side, which suggests the port is already in use. Are there any stale pasta processes on the host where podman is running? Or is there something else using port 5788?

@edsantiago
Copy link
Member Author

Same message no matter what port number I specify

@dgibson
Copy link
Collaborator

dgibson commented Oct 12, 2023

Couldn't open network namespace /run/user/1000/netns/netns-aeb16e93-56d4-0a60-ad67-c4b17976d553

That looks like one of the SELinux problems we already fixed.

As for the Failed to bind any port error, I haven't managed to reproduce so far on my own machine (f38) with passt-0^20231004.gf851084-1.fc38.x86_64.

Does the problem occur if you disable selinux enforcement?

@edsantiago
Copy link
Member Author

OK, yeah, it's SELinux; container-selinux was not installed on my VM. This is obviously not the same error as was being seen in CI, because container-selinux is installed there. I will continue investigating next week, but for now, everyone else stand down. Thank you for the suggestion.

@edsantiago
Copy link
Member Author

edsantiago commented Oct 16, 2023

Yep, still broken with passt-0^20231004.gf851084-1.fc40.x86_64. Look at the timestamps. The first one passes in O(3s), the next few in O(20s), then, hang.

$ while :;do echo;date;hack/bats --rootless 505:"TCP/IPv4 large transfer, tap"||break;done

Mon Oct 16 10:19:20 AM EDT 2023
--------------------------------------------------
$ bats --filter TCP/IPv4 large transfer, tap test/system/505-networking-pasta.bats
505-networking-pasta.bats
 ✓ podman networking with pasta(1) - TCP/IPv4 large transfer, tap

1 test, 0 failures


Mon Oct 16 10:19:23 AM EDT 2023
--------------------------------------------------
$ bats --filter TCP/IPv4 large transfer, tap test/system/505-networking-pasta.bats
505-networking-pasta.bats
 ✓ podman networking with pasta(1) - TCP/IPv4 large transfer, tap

1 test, 0 failures


Mon Oct 16 10:19:49 AM EDT 2023
--------------------------------------------------
$ bats --filter TCP/IPv4 large transfer, tap test/system/505-networking-pasta.bats
505-networking-pasta.bats
 ✓ podman networking with pasta(1) - TCP/IPv4 large transfer, tap

1 test, 0 failures


Mon Oct 16 10:20:16 AM EDT 2023
--------------------------------------------------
$ bats --filter TCP/IPv4 large transfer, tap test/system/505-networking-pasta.bats
505-networking-pasta.bats
 ✓ podman networking with pasta(1) - TCP/IPv4 large transfer, tap

1 test, 0 failures


Mon Oct 16 10:20:42 AM EDT 2023
--------------------------------------------------
$ bats --filter TCP/IPv4 large transfer, tap test/system/505-networking-pasta.bats
505-networking-pasta.bats
   podman networking with pasta(1) - TCP/IPv4 large transfer, tap        <<<<<<---------- HANG

After 5 minutes I ^Ced and got:

  Received SIGINT, aborting ...

Twiddle thumbs five more minutes, ^C again, nothing useful. Test terminates but without a backtrace.

@edsantiago
Copy link
Member Author

Here's ps on a hung job:

# ps auxww --forest
...
fedora     34534  0.0  1.2  76220 24632 ?        Ss   11:39   0:00 /usr/bin/pasta --config-net -t 10.0.184.181/5304-5304:5304-5304 -u none -T none -U none --no-map-gw --netns /run/user/1000/netns/netns-320bc153-dbe7-1873-fde9-23aa349d29da
fedora     34536  0.0  0.1   9776  2332 ?        Ss   11:39   0:00 /usr/bin/conmon [....]
fedora     34538  0.0  0.0   1616   896 ?        Ss   11:39   0:00  \_ sh -c for port in $(seq 5304 5304); do                              socat -u TCP4-LISTEN:${port},bind=[10.0.184.181] EXEC:md5sum &                          done; wait
fedora     34542  0.0  0.1   4916  2432 ?        S    11:39   0:00      \_ socat -u TCP4-LISTEN:5304,bind=[10.0.184.181] EXEC:md5sum
fedora     34544  0.0  0.0   1616   640 ?        S    11:39   0:00          \_ md5sum

FWIW it reproduces pretty easily, within 15 minutes at most.

@dgibson
Copy link
Collaborator

dgibson commented Oct 17, 2023

Unfortunately those additional reproduces don't have any additional clues.

@edsantiago, again, do you have an easy way to set up an fc40 VM? It looks like I'm going to have to reproduce this myself to investigate.

@edsantiago
Copy link
Member Author

Yeah, it's just a 1minutetip VM with podman from git + dependencies. It takes seconds. Do you have 1minutetip? If not, PM me on a secure channel and I'll set one up for you. 1mt VMs have a limited lifetime, 6 hours or so, so it's much better if you set up your own.

@dgibson
Copy link
Collaborator

dgibson commented Oct 17, 2023

With @edsantiago 's assistance, I've now reproduced the problem both on a VM he set up, and one I set up. I've simplified the reproducing case down a bit, but still don't have a lot in the way of clues about what's going on. Because the client/sending socat appears to be finished in the stuck case, I'm guessing it's (another) race in FIN handling.

I expect to continue digging into this tomorrow and make some more substantial progress.

@dgibson
Copy link
Collaborator

dgibson commented Oct 18, 2023

Ok, through yesterday I've discovered some interesting new things.

  • The package version in use already includes fixes for upstream bug 74, so it's not identical to that bug.
  • Just to verify, I also checked the problem still occurs with upstream git passt, rather than the package
  • However, the problem can no longer be reproduced if I increase the net.core.rmem_max sysctl to 4MiB
  • That strongly suggests this is closely related to bug 74
  • When I reduce the rmem_max value, I'm now able to reproduce it on my laptop, so it doesn't look like it's specific to something in Fedora 40 after all.

Continuing to investigate.

@dgibson
Copy link
Collaborator

dgibson commented Oct 18, 2023

Investigated further today. The mechanics here seem to be very similar to upstream bug 74.

  • Packet traces on the guest/server/receiver side simply show the transfer stopping for no clear reason.
  • Packet traces on the host/client/sender side show the sender getting a window full event, then no further transfers
  • We get occasional TCP keepalive packets from the sender, which the receiver (pasta) responds to with explicit zero window messages
  • The guest side packet traces show a non-zero window advertised by the guest-side server
  • Instrumentation I added in pasta shows an estimate of the amount of data in the receive buffer as zero - so we'd expect the host side window to be quite large

Our analysis indicated that bug 74 was caused by a kernel bug: it appears that our heavy use of MSG_PEEK and MSG_TRUNC causes the kernel to miscalculate the window size. We thought we'd worked around that bug by explicitly setting the TCP_WINDOW_CLAMP socket option, which seemed to force the kernel to recalculate the window correctly.

My working theory is that that workaround isn't as robust as we'd hoped, and even with the TCP_WINDOW_CLAMP calls, once the kernel hits a zero window size momentarily it's incorrectly not re-opening the window when we consume data with MSG_TRUNC.

As an aside, I tried replacing our MSG_TRUNC call with an explicit recv() into a discard buffer, but that didn't change the behaviour. That suggests the kernel miscalculation is in the handling of MSG_PEEK rather than MSG_TRUNC.

@dgibson
Copy link
Collaborator

dgibson commented Oct 24, 2023

Forgot to mention, I now have a reproducer with pasta alone, not using podman. I'm using this script (which will need some tweaking for anyone using it on a different system):

#! /bin/bash

PASTA=~/src/passt/pasta
PORT=38237
FILE=~/passt/podman20170/xfer.bin

set -e

IFNAME="$(ip -j -4 route show | jq -rM '[.[] | select(.dst == "default").dev] | .[0]')"

ADDR="$(ip -j -4 addr show "${IFNAME}" | jq -rM '.[0] | .addr_info[0].local')"

md5sum $FILE

test_once() {
    trap "exit 1" SIGINT

    $PASTA --config-net -t $PORT --debug --log-file=pasta.log --pcap=guestside.pcap -- socat -u TCP4-LISTEN:$PORT EXEC:md5sum &
    sleep 1
    socat -u $FILE TCP4:$ADDR:$PORT
    wait
}

date
while test_once; do
    date
done

@dgibson
Copy link
Collaborator

dgibson commented Oct 24, 2023

There are actually two symtpoms we see, both with the original podman reproducer and the pasta alone reproducer.

  1. The transfer stalls completely and never finishes
  2. The transfer completes after roughtly 30s, instead of the normal ~2s for a working run

I've been focusing on case (1), but (2) is seen as well.

As an experiment I tried entirely disabling use of TCP_WINDOW_CLAMP (on the grounds that while it made for a partial workaround, maybe it's also part of the triggering conditions. When I do that, I'm no longer able to reproduce symptom (1), but symptom (2) becomes much more common (more commmon than successful runs).

@dgibson
Copy link
Collaborator

dgibson commented Oct 25, 2023

Tried a different experiment of not using TCP_WINDOW_CLAMP with the same window sized we already set. If we're in STALLED state we always adjust it upwards. This didn't appear to have any effect on the symptoms.

@dgibson
Copy link
Collaborator

dgibson commented Oct 25, 2023

I tried double checking that we're not somehow missing epoll events: I added instrumentation on a timer that ran did a recv() with MSG_PEEK on the sockets. After we're definitely in stalled state, this is still returning EAGAIN.

@dgibson
Copy link
Collaborator

dgibson commented Nov 10, 2023

Hi @edsantiago ,

Sorry there have been no updates for a while. I've been looking into this with assistance from @sbrivio-rh and Paolo Abeni, and we think we finally figured it out.

It seems the cause of the stall is a kernel bug, where calling TCP_WINDOW_CLAMP at the wrong moment can permanently lock the TCP window to 0 bytes. However, it also appears that TCP_WINDOW_CLAMP probably didn't really do what we wanted it, even before that bug was introduced. At least for the time being it looks like the solution is not to use TCP_WINDOW_CLAMP at all in passt.

We have a draft fix here. Could you try that out and see if works for you?

@edsantiago
Copy link
Member Author

Nice work! I'd be happy to give it a spin, but is there any chance you could give me a scratch rpm? It's a lot easier for me to wget & rpm -Uvh than it is to git clone, figure out & install dependencies, figure out build process, figure out what to install where.

(Also, I don't work Fridays. I will be home part of tomorrow, though, and will be glad to instrument a test PR during a free moment).

@sbrivio-rh
Copy link
Collaborator

Nice work! I'd be happy to give it a spin, but is there any chance you could give me a scratch rpm?

There you go:

https://passt.top/builds/copr/0%5E20231110.gcf3eeba/fedora-rawhide-x86_64/06618377-passt/passt-0%5E20231110.gcf3eeba-1.fc40.x86_64.rpm

By the way:

It's a lot easier for me to wget & rpm -Uvh than it is to git clone, figure out & install dependencies

no dependencies other than a standard C library.

figure out build process

make

@edsantiago
Copy link
Member Author

Thanks. I ended up running my above reproducer in a 1mt, it ran fine for the lifetime of the VM. And each time I checked, it was O(5s), not 30. Your code LGTM, thank you!

@sbrivio-rh
Copy link
Collaborator

Thanks. I ended up running my above reproducer in a 1mt, it ran fine for the lifetime of the VM. And each time I checked, it was O(5s), not 30. Your code LGTM, thank you!

This is now included in passt-0^20231110.g5ec3634-1.fc40 for Fedora Rawhide. Debian update is still pending.

@edsantiago
Copy link
Member Author

Unfortunately, today's VM build picked up 2023-11-07 so the test continues to remain skipped.

AkihiroSuda pushed a commit to AkihiroSuda/passt-mirror that referenced this issue Nov 21, 2023
On the L2 tap side, we see TCP headers and know the TCP window that the
ultimate receiver is advertising.  In order to avoid unnecessary buffering
within passt/pasta (or by the kernel on passt/pasta's behalf) we attempt
to advertise that window back to the original sock-side sender using
TCP_WINDOW_CLAMP.

However, TCP_WINDOW_CLAMP just doesn't work like this.  Prior to kernel
commit 3aa7857fe1d7 ("tcp: enable mid stream window clamp"), it simply
had no effect on established sockets.  After that commit, it does affect
established sockets but doesn't behave the way we need:
  * It appears to be designed only to shrink the window, not to allow it to
    re-expand.
  * More importantly, that commit has a serious bug where if the
    setsockopt() is made when the existing kernel advertised window for the
    socket happens to be zero, it will now become locked at zero, stopping
    any further data from being received on the socket.

Since this has never worked as intended, simply remove it.  It might be
possible to re-implement the intended behaviour by manipulating SO_RCVBUF,
so we leave a comment to that effect.

This kernel bug is the underlying cause of both the linked passt bug and
the linked podman bug.  We attempted to fix this before with passt commit
d3192f6 ("tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag").
However while that commit masked the bug for some cases, it didn't really
address the problem.

Fixes: d3192f6 ("tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag")
Link: containers/podman#20170
Link: https://bugs.passt.top/show_bug.cgi?id=74
Signed-off-by: David Gibson <[email protected]>
Signed-off-by: Stefano Brivio <[email protected]>
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Dec 19, 2023

@edsantiago any update on this one?

@edsantiago
Copy link
Member Author

This is believed to be fixed in pasta-2023-11-10 but the CI VMs are still on 11-07 (even though they have been respun twice in the past few weeks). All we're doing is waiting for the new pasta rpm to propagate.

@rhatdan
Copy link
Member

rhatdan commented Dec 19, 2023

Not sure I understand but this looks like it is 2023-12-08
IMAGE_SUFFIX: "c20231208t193858z-f39f38d13"

Or are you talking about something else?

@edsantiago
Copy link
Member Author

Something else. That string is the date/time the VMs were built. The string I mentioned is the pasta version (it's confusing because pasta versions are YMD, not semantic.

This table might help. This is my new little thing that displays package versions on all VM builds. Find the passt column. You can see that some OSes have newer pasta, but some still have old (1107). If run rootless, this pasta test would fail on that OS.

(Sorry about the dash in fedora. Long story short, the way my script finds versions is by looking for dnf/apt updates, and this OS is apparently keeping the default version, and I have no way to find out what that is). HTH.

@sbrivio-rh
Copy link
Collaborator

This table might help. This is my new little thing that displays package versions on all VM builds. Find the passt column. You can see that some OSes have newer pasta, but some still have old (1107). If run rootless, this pasta test would fail on that OS.

Ah, yes, for Fedora 39 I waited a couple of weeks before submitting this update -- there's always the risk of obsoleting previous updates which are still in testing. This has been in stable for just 11 hours how.

I suppose it should be sorted on Debian too in a bit -- 0.0~git20231204.b86afe3-1 was accepted into unstable 10 days ago.

@edsantiago
Copy link
Member Author

Thanks for the update!

Who wants me to spin up new VMs and cycle them into CI juuuuuust before a long holiday break? Anyone? Anyone?

@sbrivio-rh
Copy link
Collaborator

Who wants me to spin up new VMs and cycle them into CI juuuuuust before a long holiday break? Anyone? Anyone?

I can remind you on Friday evening. Today... I don't know, it wouldn't feel professional enough.

@edsantiago
Copy link
Member Author

PR #21129 merged today, with pasta 2012-12-04 in all VMs and the skip removed from that test. I've run a couple of extra CI laps, and all seems fine, so I'm going to close this and hope for the best. Thank you @dgibson and @sbrivio-rh!

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Apr 3, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. pasta pasta(1) bugs or features
Projects
None yet
Development

No branches or pull requests

4 participants