Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: sign image: Unexpected error: can't connect to the gpg-agent #17966

Closed
edsantiago opened this issue Mar 28, 2023 · 15 comments · Fixed by #18578
Closed

e2e: sign image: Unexpected error: can't connect to the gpg-agent #17966

edsantiago opened this issue Mar 28, 2023 · 15 comments · Fixed by #18578
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

  podman sign image
...
Unexpected error:
               <*exec.ExitError | 0xc0019f86c0>: 
               exit status 2
               {
                   ProcessState: {
                       pid: 130096,
                       status: 512,
                       rusage: {
                           Utime: {Sec: 0, Usec: 6129},
                           Stime: {Sec: 0, Usec: 10342},
                           Maxrss: 62000,
                           Ixrss: 0,
                           Idrss: 0,
                           Isrss: 0,
                           Minflt: 603,
                           Majflt: 5,
                           Nswap: 0,
                           Inblock: 520,
                           Oublock: 56,
                           Msgsnd: 0,
                           Msgrcv: 0,
                           Nsignals: 0,
                           Nvcsw: 34,
                           Nivcsw: 15,
                       },
                   },
                   Stderr: nil,
               }

the ginkgo line seems to be https://github.com/containers/podman/blob/a91cde637ee2b4f6e8db60147b2f46e6fe482476/test/e2e/image_sign_test.go#L48-L50

There's no useful output or any indication of what the error actually is, but I'm going to guess that this is another contention bug which needs to be addressed either via locking or via $GNUPGHOME or --homedir

Podman image sign [It] podman sign image

@edsantiago edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Mar 28, 2023
@vrothberg
Copy link
Member

The tests already set GNUPGHOME , see
https://github.com/containers/podman/blob/a91cde637ee2b4f6e8db60147b2f46e6fe482476/test/e2e/image_sign_test.go#L35-L36

I am not sure what's going on.

@vrothberg
Copy link
Member

vrothberg commented Mar 29, 2023

Ah, ah, ah. We're not using --homedir but the env var which may cause farts when tests are run in parallel.

@vrothberg
Copy link
Member

Ah, ah, ah. We're not using --homedir but the env var which may cause farts when tests are run in parallel.

No, I don't think I understand what's going on. Calling @mtrmac for help.

@edsantiago
Copy link
Member Author

First step is to instrument the code so it actually gives a useful error message, no?

@Luap99
Copy link
Member

Luap99 commented Mar 29, 2023

I assume our logs catch the full stdout/err, so to get actual output we need to add:

cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr

I can open a PR to add this so we can read the actual error message from gpg.

Luap99 added a commit to Luap99/libpod that referenced this issue Mar 29, 2023
By default go will not keep the stdout/err attach when executing
commands via exec.Command(). It is required to explicitly pass the
current stdout/err fds down to the child so we can see the error output
in the logs to debug containers#17966.

Signed-off-by: Paul Holzinger <[email protected]>
@mtrmac
Copy link
Collaborator

mtrmac commented Mar 29, 2023

(Note mostly to self: I can’t see anything obviously problematic in image_sign_test.go.)

@vrothberg
Copy link
Member

@edsantiago, do you see this flake after #17976?

@edsantiago
Copy link
Member Author

Unfortunately, no, but I haven't been as active on my no-retry PR. I ran it several times today and did not see any triggers. Will keep trying.

@edsantiago
Copy link
Member Author

I don't see any place in tests where umask is changed... but is it possible that it is? I suspect gpg might barf if its directory is 77x.

err := os.Mkdir(tempGNUPGHOME, os.ModePerm)

Any objection to s/os.ModePerm/0700/ and wishfulthinkingly closing this?

@Luap99
Copy link
Member

Luap99 commented Apr 25, 2023

Any objection to s/os.ModePerm/0700/ and wishfulthinkingly closing this?

Loose permissions could definitely cause problems but I don't see why that would trigger a flake. Definitely doesn't hurt fixing this up and hope that we will never see it again.

@edsantiago
Copy link
Member Author

That doesn't seem to be it. gpg will warn, but not fatally:

$ mkdir --mode 0777 /tmp/mygpg
$ ls -ld /tmp/mygpg
drwxrwxrwx. 2 esm esm 40 Apr 25 11:24 /tmp/mygpg/
$ GNUPGHOME=/tmp/mygpg gpg --import test/e2e/sign/secret-key.asc
gpg: WARNING: unsafe permissions on homedir '/tmp/mygpg'
gpg: keybox '/tmp/mygpg/pubring.kbx' created
gpg: /tmp/mygpg/trustdb.gpg: trustdb created
gpg: key A9AA07032E8FD9B2: public key "foobar <[email protected]>" imported
gpg: key A9AA07032E8FD9B2: secret key imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg:       secret keys read: 1
gpg:   secret keys imported: 1
$ echo $?
0

@edsantiago
Copy link
Member Author

Finally! Got the flake, with useful log:

→ Enter [It] podman sign image - /var/tmp/go/src/github.com[/containers/podman/test/e2e/image_sign_test.go:47](https://github.com/containers/podman/blob/0aac5007991956a1ca8864a382d91b8108bfd388/test/e2e/image_sign_test.go#L47) @ 05/03/23 13:29:39.061
           gpg: WARNING: unsafe permissions on homedir '/tmp/podman_test3949304392/tmpGPG'
           gpg: keybox '/tmp/podman_test3949304392/tmpGPG/pubring.kbx' created
           gpg: /tmp/podman_test3949304392/tmpGPG/trustdb.gpg: trustdb created
           gpg: key A9AA07032E8FD9B2: public key "foobar <[email protected]>" imported
           gpg: can't connect to the gpg-agent: IPC connect call failed
           gpg: error getting the KEK: No agent running
           gpg: error reading 'sign/secret-key.asc': No agent running
           gpg: import from 'sign/secret-key.asc' failed: No agent running
           gpg: Total number processed: 0
           gpg:               imported: 1
           gpg:       secret keys read: 1

This is rootless, and [checks] all the above flakes are rootless, so I'm guessing it's becaue of the way rootless-CI is setup up, with ssh? But no, I've looked at environment (view-source on colorized log) and see no AGENT, SSH, GNUPG, or GPG strings. Nor do I find any under test/e2e (I thought it might be incomplete cleanup). I'm lost. Giving up for now, will revisit later.

@edsantiago edsantiago changed the title e2e: sign image: Unexpected error (in gpg??? no useful diagnostics) e2e: sign image: Unexpected error: can't connect to the gpg-agent May 3, 2023
@mtrmac
Copy link
Collaborator

mtrmac commented May 4, 2023

As a vague intuition, I wouldn’t be too surprised if gpg-agent(s?) were unhappy with our many temporary GNUPGHOME directories. But that’s not pointing at anything specific to do or fix.

Maybe we could set up an agent config file ( https://www.gnupg.org/documentation/manuals/gnupg/Agent-Configuration.html ) enabling logging, and capture the log when the test fails.

@edsantiago
Copy link
Member Author

I think you're right. I spent waaaaay too long on this last night, giving up in frustration: that gpg agent is annoying. The only conclusion I came to last night was that we need to Serialize al the gpg tests and also kill agents in cleanup. Today, that's still the only option I think feasible, but I hate it so much that I won't even mention it. Oops too late.

@mtrmac
Copy link
Collaborator

mtrmac commented May 4, 2023

For the record, containers/image#1779 includes a way to kill the agent.

edsantiago added a commit to edsantiago/libpod that referenced this issue May 15, 2023
Reason: gpg tests all run with a different GNUPGHOME, and gpg-agent
does not like that, and there's no longer any way to run gpg
without the agent. So, do not run these tests in parallel, and
clean up agent after each test.

Fixes: containers#17966 (I hope)

May also fix containers#18358 but it will take some time to be sure.

Signed-off-by: Ed Santiago <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants