Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/e2e/systemd_activate_test.go: simplify test #18056

Merged
merged 1 commit into from
Apr 12, 2023

Conversation

vrothberg
Copy link
Member

While debugging #17904 we found the test to be missing the common podman flags. Add them to the podman invocations and remove some clutter.

Does this PR introduce a user-facing change?

None

@edsantiago @Luap99 PTAL

Note that the test will now also run with sqlite (even without the containers.conf PR).

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none labels Apr 5, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2023
@@ -111,14 +85,14 @@ var _ = Describe("Systemd activate", func() {
}

podman := func(args ...string) *testUtils.PodmanSession {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you need to repush for any reason, I think future maintainers might welcome a comment here explaining that we can't use the usual podmanTest.Podman() because we need to ensure that we invoke local-podman.

Or, on further thought, WDYT of using podmanTest.Podman(), and adding SkipIfRemote()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea(s)!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vrothberg, #18083 has merged, so it's safe to proceed with this. I have a moderate preference for following up on my suggestion above (use Podman, add SkipIfRemote) but will not block on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Note that the test already skips remote. I moved it up a bit to make it easier to spot.

@vrothberg
Copy link
Member Author

Bless my beard ... no flake.

@edsantiago what does your PR say?

@vrothberg
Copy link
Member Author

Nevermind, I can/should check myself :)

@edsantiago
Copy link
Member

Nah, it's very easy for me: my cirrus-pr-timing script automatically downloads all CI logs, and I can grep those easily. Much better than hand-clicking all links. Just, gimme 5m because my DSL network is slow.

@edsantiago
Copy link
Member

Oh. Duhhhhh, this is a NOP, because it's not running with sqlite.

I've just cherry-picked your PR onto #17954, and will check again in an hour.

@vrothberg
Copy link
Member Author

OK, with this PR the test really ran with sqlite (see https://api.cirrus-ci.com/v1/task/5951745612840960/logs/main.log). I restarted the int test to get some more data points.

@edsantiago
Copy link
Member

the test really ran with sqlite

Yes, but via command-line, not via containers.conf. I haven't yet dismissed the possibility that there's a stray podman process out there somewhere, getting run outside of the usual e2e mechanisms, that might be causing collisions. Or that the --db-backend option behaves differently than containers.conf. (Of course, if so, it will be much harder to track down now).

@vrothberg
Copy link
Member Author

Repush to have some more data points. I am surprised it does not flake yet. Nothing makes sense in my head.

@edsantiago
Copy link
Member

No flakes in my sqlite PR:

$ ack 'no such object.*top' cirrus-pr-timing.17954.0405-0924
[no results!]

Failures in debian are the namespace/tmpdir bug being discussed and addressed in #18057

@edsantiago
Copy link
Member

LGTM but let's wait until #18057 is resolved. This is causing too many flakes.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 5, 2023
@rhatdan
Copy link
Member

rhatdan commented Apr 6, 2023

Is this still a WIP? Ready to be merged?

@edsantiago
Copy link
Member

Please do not merge until #18083 (or equivalent). Without that, we will be pressing Re-run for even more hours than we do already.

@edsantiago edsantiago added do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. do-not-merge/needs-unit-tests do-not-merge/needs-integration-tests do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None do-not-merge/needs-confirmation Waiting for feedback from stakeholder before merging labels Apr 11, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Apr 11, 2023
@edsantiago
Copy link
Member

I'm blocking this with every do-not-merge label I can find. This PR does not play well with #18083. Somehow the combination of the two causes the systemd-socket-etc command in the stop test to fail 100% of the time (rootless only, because only rootless has pause processes), with one of these two failures:

Error: failed to start API service: accept unix @: accept: invalid argument

or

Error: unexpected fd received from systemd: cannot listen on it

See results of #17954, in which I cherry-picked this PR. All int rootless tests fail with the same symptom.

I am blocking to prevent some well-intentioned person from merging this without rebasing on main. That would result in CI failing for everyone.

@vrothberg the problem reproduces reliably on my f37 laptop, you should be able to just rebase this on main, use whatever ginkgo -filter script you use to run just this test, and see the failure.

@vrothberg
Copy link
Member Author

vrothberg commented Apr 12, 2023

The following diff will "fix" the issue (note the system reset -f) before starting the service:

diff --git a/test/e2e/systemd_activate_test.go b/test/e2e/systemd_activate_test.go                   
index be161fd76b4d..6400d69ba6da 100644                                                              
--- a/test/e2e/systemd_activate_test.go                                                              
+++ b/test/e2e/systemd_activate_test.go                                                              
@@ -66,6 +66,19 @@ var _ = Describe("Systemd activate", func() {                                     
                                                                                                     
                podmanOptions := podmanTest.makeOptions(nil, false, false)                           
                                                                                                     
+               podman := func(args ...string) *testUtils.PodmanSession {                            
+                       args = append(podmanOptions, args...)                                        
+                       return testUtils.SystemExec(podmanTest.PodmanBinary, args)                   
+               }                                                                                    
+                                                                                                    
+               podmanRemote := func(args ...string) *testUtils.PodmanSession {                      
+                       args = append([]string{"--url", "tcp://" + addr}, args...)                   
+                       return testUtils.SystemExec(podmanTest.RemotePodmanBinary, args)             
+               }                                                                                    
+                                                                                                    
+               reset := podman("system", "reset", "-f")                                             
+               Expect(reset).Should(Exit(0))                                                        
+                                                                                                    
                systemdArgs := []string{                                                             
                        "-E", "http_proxy", "-E", "https_proxy", "-E", "no_proxy",                   
                        "-E", "HTTP_PROXY", "-E", "HTTPS_PROXY", "-E", "NO_PROXY",                   
@@ -78,17 +91,6 @@ var _ = Describe("Systemd activate", func() {                                     
                Expect(activateSession.Exited).ShouldNot(Receive(), "Failed to start podman service")
                defer activateSession.Signal(syscall.SIGTERM)                                        
                                                                                                     
-               // Curried functions for specialized podman calls                                    
-               podmanRemote := func(args ...string) *testUtils.PodmanSession {                      
-                       args = append([]string{"--url", "tcp://" + addr}, args...)                   
-                       return testUtils.SystemExec(podmanTest.RemotePodmanBinary, args)             
-               }                                                                                    
-                                                                                                    
-               podman := func(args ...string) *testUtils.PodmanSession {                            
-                       args = append(podmanOptions, args...)                                        
-                       return testUtils.SystemExec(podmanTest.PodmanBinary, args)                   
-               }                                                                                    
-                                                                                                    
                containerName := "top_" + testUtils.RandomString(8)                                  
                apiSession := podmanRemote(                                                          
                        "create", "--tty", "--name", containerName, "--entrypoint", "top",           

To me it looks like podman system service doesn't like joining an existing pause process (when running inside of systemd).

@vrothberg
Copy link
Member Author

@Luap99 WDYT?

@vrothberg
Copy link
Member Author

Environment of the system service in the systemd session (sorry copied from the terminal):

TERM_PROGRAM_VERSION=3.3aPATH=/usr/lib/golang/bin:/home/vrothberg/bin:/home/vrothberg/.local/bin::/home/vrothberg/bin:/home/vr
othberg/.local/bin::/home/vrothberg/bin:/home/vrothberg/.local/bin::/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/vr
othberg/go/bin:/home/vrothberg/go/bin:/home/vrothberg/go/binUSERNAME=vrothbergHOME=/home/vrothbergLISTEN_FDS=1LISTEN_PID=35379
5http_proxy=https_proxy=no_proxy=HTTP_PROXY=HTTPS_PROXY=NO_PROXY=

Environment from local podman process of the same test:

GOPATH=/home/vrothberg/goSHELL=/bin/bashSESSION_MANAGER=local/unix:@/tmp/.ICE-unix/2194,unix/unix:/tmp/.ICE-unix/2194COLORTERM
=truecolorHISTCONTROL=ignoredupsXDG_MENU_PREFIX=gnome-TERM_PROGRAM_VERSION=3.3aTMUX=/tmp/tmux-1000/default,281464,0HOSTNAME=ne
buchadnezzarHISTSIZE=1000SSH_AUTH_SOCK=/run/user/1000/keyring/sshXMODIFIERS=@im=ibusDESKTOP_SESSION=gnomeEDITOR=nvimXDG_SESSIO
N_DESKTOP=gnomeLOGNAME=vrothbergXDG_SESSION_TYPE=waylandSYSTEMD_EXEC_PID=2610XAUTHORITY=/run/user/1000/.mutter-Xwaylandauth.IU
VE31GDM_LANG=en_US.UTF-8HOME=/home/vrothbergUSERNAME=vrothbergLANG=en_US.UTF-8LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:
so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;37;41:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:
*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz
=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst
=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31
:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=
01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=0
1;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:
*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.
ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01
;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.em
f=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.m4a=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;
36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.oga=01;36:*.opus=01;36:*.spx=01;36:*.xspf=01;36:*~=00;90:*#=00;90:*.bak=00
;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.swp=00;90:*.tmp=00;90:*.dpkg-dist=00;90:*.dpkg-old=00;90:*.ucf-dist=00
;90:*.ucf-new=00;90:*.ucf-old=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:XDG_CURRENT_DESKTOP=GNOMEVTE_VERSION=7200WA
YLAND_DISPLAY=wayland-0GNOME_TERMINAL_SCREEN=/org/gnome/Terminal/screen/2c3f0dfb_d9ec_45c7_9899_afad87e9db79MOZ_GMP_PATH=/usr/
lib64/mozilla/plugins/gmp-gmpopenh264/system-installedGNOME_SETUP_DISPLAY=:1XDG_SESSION_CLASS=userTERM=xterm-256colorLESSOPEN=
||/usr/bin/lesspipe.sh %sUSER=vrothbergTMUX_PANE=%1GNOME_TERMINAL_SERVICE=:1.603GOPROXY=https://proxy.golang.orgDISPLAY=:0SHLV
L=1QT_IM_MODULE=ibusXDG_RUNTIME_DIR=/run/user/1000DEBUGINFOD_URLS=https://debuginfod.fedoraproject.org/ XDG_DATA_DIRS=/home/vr
othberg/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share/:/usr/share/GDMSESSION=gnomeDBUS_SE
SSION_BUS_ADDRESS=unix:path=/run/user/1000/busMAIL=/var/spool/mail/vrothbergTERM_PROGRAM=tmux_=/usr/bin/goPATH=/usr/lib/golang
/bin:/home/vrothberg/bin:/home/vrothberg/.local/bin::/home/vrothberg/bin:/home/vrothberg/.local/bin::/home/vrothberg/bin:/home
/vrothberg/.local/bin::/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/vrothberg/go/bin:/home/vrothberg/go/bin:/home/v
rothberg/go/binPWD=/home/vrothberg/containers/podman/test/e2eDISABLE_HC_SYSTEMD=trueCONTAINERS_REGISTRIES_CONF=/home/vrothberg
/containers/podman/test/registries.conf                                                                      

@openshift-ci openshift-ci bot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Apr 12, 2023
@Luap99
Copy link
Member

Luap99 commented Apr 12, 2023

I need to take a closer look but please do not run system reset or migrate to stop the pause process in e2e tests. Doing this in parallel is not safe! Unless I am missing something #17903 is caused by system reset running in parallel.

@vrothberg
Copy link
Member Author

I need to take a closer look but please do not run system reset or migrate to stop the pause process in e2e tests. Doing this in parallel is not safe! Unless I am missing something #17903 is caused by system reset running in parallel.

I agree. It was not my plan to do so but to dig deeper. It turns out that setting the XDG_RUNTIME_DIR for systemd-active sessions resolves the issue.

But there still seems to be some interference between the latest changes and adding all the --this --thats .

@vrothberg vrothberg force-pushed the this-that branch 2 times, most recently from 27a8302 to 848962e Compare April 12, 2023 09:46
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsantiago
Copy link
Member

@vrothberg, needs gofmt -w -s test/e2e/systemd_activate_test.go. I've done that on my churn PRs and have tests running there.

While debugging containers#17904 we found the test to be missing the common podman
flags.  Add them to the podman invocations and remove some clutter.

Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

Cheers! My tab didn´t refresh so it looked good over here :^)

@edsantiago edsantiago removed do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. do-not-merge/needs-unit-tests do-not-merge/needs-integration-tests do-not-merge/needs-confirmation Waiting for feedback from stakeholder before merging labels Apr 12, 2023
@vrothberg vrothberg marked this pull request as ready for review April 12, 2023 12:23
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2023
@edsantiago
Copy link
Member

CI is green (for all important purposes). Downloaded CI logs show only well-known flakes - usually the "no logs from conmon" one -- no stop ones. Same with #17831 and #17954. LGTM.

@vrothberg
Copy link
Member Author

Awesome! Shall we put #17904 to bed?

@edsantiago
Copy link
Member

Once this merges, I'm OK with closing that. It's unfortunate that we don't understand it, though.

@vrothberg
Copy link
Member Author

I share that feeling, @edsantiago. Not satisfying but I feel more comfortable going ahead after all the investigations.

Green, feel free to merge.

@edsantiago
Copy link
Member

/lgtm

(hold still in place)

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 12, 2023
@vrothberg
Copy link
Member Author

/hold cancel

Let's do this :)

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 12, 2023
@openshift-merge-robot openshift-merge-robot merged commit d45ad05 into containers:main Apr 12, 2023
@vrothberg vrothberg deleted the this-that branch April 12, 2023 13:13
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 1, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants