Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Replace setup/teardown #1592

Closed
wants to merge 9 commits into from

Conversation

martinpitt
Copy link
Member

Fixes #1591

@martinpitt
Copy link
Member Author

So the first round looks vaguely promising. I did a bashism, and I don't want to talk about this mess (probably/hopefully containers/podman#21504), and I forgot to update tmt. Next round!

@martinpitt martinpitt force-pushed the cleanup branch 6 times, most recently from b571a27 to 68cade1 Compare February 29, 2024 12:06
@martinpitt
Copy link
Member Author

martinpitt commented Feb 29, 2024

these two still need fixing. I have no idea what's going on, e.g. a local TEST_JOBS=3 TEST_OS=fedora-coreos test/common/run-tests repeatedly works fine, except for getting an occasional OOM as 3 parallel jobs is too much (but cockpit-9 didn't run into that). Same for f39. Running once more to compare logs.

update: The retried fcos failed the same way, but retried f39 worked fine. So this isn't reliable. But current main works fine, just tested this in #1593.

@martinpitt martinpitt force-pushed the cleanup branch 3 times, most recently from 328fdb7 to 5a15fd8 Compare February 29, 2024 17:34
Our previous approach of `restore_dir("/var/lib/containers")` and the
find/unmount/kill hacks around it keep causing trouble, see cockpit-project#1591.

Give up on this, and move to a model that centers around `podman system reset`.
This works reasaonably well except for that being slow (podman#21874) and
leaking conmon (see next commit).

Load our static test images with `podman save/load` instead. Also factorize
system and user cleanup, so that we do the same thing on both.

Fixes cockpit-project#1591
TestApplication.testRunImageSystem leaks conmon processes when opening
the Console of busybox-without-publish. This isn't obvious to reproduce
on the CLI (a simple `podman exec -it` doesn't do that), the opening of
our cockpit channel via the podman API does something special here.

We've had this hack for a while already. But move this into a separate
commit so that we can revert it more easily eventually.
This reverts commit d4ec0fc.
@martinpitt
Copy link
Member Author

martinpitt commented Mar 1, 2024

The problem seems to be the cockpit/ws container:

13a56b0223f7  quay.io/cockpit/ws:latest       /container/label-...  30 seconds ago  Exited (1) 30 seconds ago                          ws                

---- 13a56b0223f7 ----                                                                                                                                    
+ sed -e /pam_selinux/d -e /pam_sepermit/d /etc/pam.d/cockpit                                                                                             
+ mkdir -p /host/etc/cockpit/ws-certs.d /host/etc/cockpit/machines.d                                                                                      
mkdir: cannot create directory '/host/etc/cockpit/ws-certs.d': No such file or directory                                                                  
mkdir: cannot create directory '/host/etc/cockpit/machines.d': No such file or directory 

Somehow that doesn't survive the system reset and podman load cycle. I just wonder why I never get that failure locally..

This is of course utterly hard to get right -- we both want to completely trash and reset containers during tests, but at the same time rely on the cockpit/ws to be reliable.

I wonder if the problem is with our findmnt --list -otarget | grep /var/lib/containers/. | xargs -r umount hack. podman creates an overlay mount or btrfs volume in a subdirectory there, which interferes with our restore_dir(). See PR #1597

@martinpitt
Copy link
Member Author

This was too much, and now conflicty anyway. #1601 looks more promising.

@martinpitt martinpitt closed this Mar 1, 2024
@martinpitt martinpitt deleted the cleanup branch March 1, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frequent setup/cleanup failures in tests
1 participant