-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new pasta flake: timeout #17598
Comments
And yet another one, the repeated-timeout scenario. |
@sbrivio-rh PTAL |
@edsantiago is there a way to check what version of the I released a new version (packages available in testing for Fedora 37, "stable" for Fedora 38, currently unstable for Debian and Ubuntu) fixing occasional TCP stalls on transfers with small receive buffers, which might be the case here. |
Yes! At the top of each test log is a dump of important versions. Here I see |
Well, this is unpleasant. The many-failures one, with |
Would it be better to skip all the TCP forwarding tests (via tap device only) for the moment being, while I'm investigating this? I can't reproduce this at the moment, and we have very similar tests in passt's CI which are consistently passing, so it might take a bit -- it's nothing obvious (to me). |
Issue finally reproduced, on Fedora 37 only, but reliably -- I guess it depends on some specific timing, and probably socat sending a single byte right away after the SYN, ACK segment. The container sends a FIN, ACK segment, but pasta fails to send an ACK segment back and the connection doesn't close. This appears to be caused by https://passt.top/passt/commit/?id=cc6d8286d1043d04eb8518e39cebcb9e086dca17 I'll debug this further and try to release fixed packages later on Wednesday. Let me know if I should meanwhile send a pull request to skip those tests for the moment being. Thanks for the report, and sorry for the mess. |
Oh, wow, you've had a long evening (or long few hours in your time zone). Thank you for looking into this. I don't speak for the team, but my my personal preference right now would be for #17305 to merge. That's a complex PR that has taken many weeks of arduous effort, and keeps running into setbacks. If you can precisely identify the subset of tests that should be skipped, and could post a list or diffs or patch on that PR, I think that'd be the best use of everyone's time. That PR merging will then unfreeze other work. Thanks again for looking into this so promptly. |
An updated Fedora package fixing this is pending testing phase here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-dc03f3fc08 and a If you're curious: https://passt.top/passt/commit/?id=4ddbcb9c0c555838b123c018a9ebc9b7e14a87e5 |
For some reason, the updated package, Let me know if I should just go ahead with a patch temporarily disabling those tests at this point. |
@sbrivio-rh bodhi has been very slow the last few times I've checked, in the last two weeks. O(days). ITM #17305 has merged (YAY!) but with all pasta tests disabled. ALL pasta tests, not just a subset. If you wish to submit a PR to reenable tests, skipping only the ones that break in Once Thank you! |
Right, yes, thanks for noticing -- it was a perfect storm, also with libvirt and KubeVirt integration issues coming up at the same time.
Absolutely reasonable, thanks @cevich for the patch.
I'd skip this.
@cevich,
For Debian Sid, version |
The F37 CI VM images are configured to use
Yep, np. |
Ref: containers/podman#17598 Signed-off-by: Chris Evich <[email protected]>
@sbrivio-rh with the |
This is now fixed in #17650. |
Seen again yesterday, f37 remote on a PR in I'm reluctant to reopen based solely on one instance in one month... but will leave this here for now. |
Seen twice today, on different PRs:
In one case, only one subtest failed. In the other, subsequent tests also failed, and the entire test run timed out.
The text was updated successfully, but these errors were encountered: