Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quadlet: Add a network requirement on .image and .containers units #22057

Merged

Conversation

jbtrystram
Copy link
Contributor

If a container unit starts on boot with a dependency on default.target
the image unit may start too soon, before network is ready. This cause
the unit to fail to pull the image.
Add a dependency on network-online.target to make sure image pulls
don't fail.

Does this PR introduce a user-facing change?

yes

Image oneshot units generated through quadlet will now have a dependency on `network-online.target`

Fixes #21873

Copy link

Cockpit tests failed for commit aa7b60454692c7f0b502ce3c9d749a7ece5ce3aa. @martinpitt, @jelly, @mvollmer please check.

Copy link

Cockpit tests failed for commit d2d5b1e7755f736dafcc64ab2b04bde5696dba71. @martinpitt, @jelly, @mvollmer please check.

@rhatdan
Copy link
Member

rhatdan commented Mar 15, 2024

LGTM
@ygalblum @alexlarsson @mheon PTAL

@ygalblum
Copy link
Contributor

The specific code LGTM. But, it addresses only the .image files. Doesn't the same rational apply for .container and .kube files?

In addition, while the documentation states that setting After= will remove this dependency, there is no test to verify it. In order for it to work, Quadlet needs to make sure the keys set by the user are placed after the ones it sets on its own. While this might be the case now, without a test, there is no guaranty that this behavior does not break in the future.

@jbtrystram
Copy link
Contributor Author

@ygalblum I am looking at adding another test to verify that After= removes the dependency, but I can't see any assertion helper to use, so I am trying to add one.

I started to add assert-key-is-last which would call UnitFile.LookupLast but then I need the key to have no arguments.
The assertion then would be assert-key-is-last-and-empty but then it becomes awfully specific !
Do you have any other idea ? It's my first contrib to podman so I don't know my way around the codebase :)

@ygalblum
Copy link
Contributor

@jbtrystram how about assert-last-key-is-regex and pass ^$?

@jbtrystram
Copy link
Contributor Author

@ygalblum neat suggestion, thanks !
I added that. I am having synchronisation issues when running make test locally though, let's see if the CI runs the test !

@jbtrystram jbtrystram force-pushed the quadlet-image-network branch from c26db33 to 2b45a19 Compare March 20, 2024 08:47
Copy link

Cockpit tests failed for commit 2b45a19b1c52b8765721f07c11be5a2380fc6c39. @martinpitt, @jelly, @mvollmer please check.

Copy link

Cockpit tests failed for commit c26db33125b2b5f044d4d016d3642d3f7508b0b4. @martinpitt, @jelly, @mvollmer please check.

test/e2e/quadlet_test.go Outdated Show resolved Hide resolved
@dustymabe
Copy link
Contributor

added a comment to the issue but linking it here:

#21873 (comment)

@rhatdan
Copy link
Member

rhatdan commented Mar 29, 2024

@jbtrystram What is going on with this PR?

@jbtrystram
Copy link
Contributor Author

@rhatdan This is something i am working on my spare time and i hadn't had a lot of that lately, sorry.
I'll follow up soon

@jbtrystram jbtrystram changed the title quadlet: Add a network requirement on .image units quadlet: Add a network requirement on .image and .containers units Apr 22, 2024
@jbtrystram jbtrystram force-pushed the quadlet-image-network branch from 2b45a19 to 03ad189 Compare April 22, 2024 11:22
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@jbtrystram jbtrystram force-pushed the quadlet-image-network branch from 03ad189 to b954c6f Compare April 23, 2024 07:33
@jbtrystram

This comment was marked as off-topic.

@jbtrystram
Copy link
Contributor Author

jbtrystram commented Apr 24, 2024

@ygalblum I picked that up and fixed it. The quadlet.go code behave as expected, as quadlet --dryrun on unit-after-override.image gives :

---test.service---
## assert-last-key-is-regex "Unit" "After" "^$"

[Unit]
Wants=network-online.target
After=network-online.target
After=
SourcePath=/etc/containers/systemd/test.container
RequiresMountsFor=%t/containers

.....

However the test code with that assert-last-key-is-regex does not work yet. I'll keep working on it but still pushed, if you have some suggestions in the meanwhile i'll take them :)

I tried to add some debug statements in the test code but they're not printed when running make localintegration FOCUS_FILE=quadlet_test.go

test/e2e/quadlet_test.go Outdated Show resolved Hide resolved
@ygalblum
Copy link
Contributor

Another two tests are failing since they already check the After key. The check assert-key-is expects all the values in the correct order. So you need to change the following:
test/e2e/quadlet/mount.container, line 13:

## assert-key-is "Unit" "After" "network-online.target" "vol2-volume.service"

test/e2e/quadlet/network.quadlet.container, line 3:

## assert-key-is "Unit" "After" "network-online.target" "basic-network.service"

@jbtrystram jbtrystram force-pushed the quadlet-image-network branch from f68dc4d to 2ad505d Compare April 25, 2024 11:58
@ygalblum
Copy link
Contributor

ygalblum commented May 7, 2024

/ok-to-test

@ygalblum
Copy link
Contributor

ygalblum commented May 7, 2024

/retest

@jbtrystram jbtrystram force-pushed the quadlet-image-network branch from 401a1ec to 69a7343 Compare May 21, 2024 14:30
@jbtrystram
Copy link
Contributor Author

@ygalblum I rebased this to see if that help with the tests.

@ygalblum
Copy link
Contributor

test_fixes.txt
@jbtrystram The tests fail because they were checking the values of the After key and this PR changes them (as it automatically adds network-online.target.
I've attached a patch file you can use to fix the tests. When in the Podman base directory run:

patch -p1 < test_fixes.txt

@jbtrystram
Copy link
Contributor Author

@ygalblum I got confused by your patch because I am 100% sure I already fixed those tests earlier. I think I messed up the rebase yesterday and pushed an out of date branch

@jbtrystram jbtrystram force-pushed the quadlet-image-network branch from 69a7343 to 401a1ec Compare May 22, 2024 08:52
@jbtrystram
Copy link
Contributor Author

jbtrystram commented May 22, 2024

yeah, that's better. I found my previous work on another machine.
Sorry for the trouble @ygalblum and thanks for your patience

The test failure is related to buildah :

[+0784s] not ok 119 bud and test --unsetlabel
         # (from function `assert' in file tests/[helpers.bash, line 521](https://github.com/containers/podman/blob/401a1ecd8ecd51efde63fc36d6837df307f173f7/test/system/helpers.bash#L521),
         #  from function `expect_output' in file tests/[helpers.bash, line 548](https://github.com/containers/podman/blob/401a1ecd8ecd51efde63fc36d6837df307f173f7/test/system/helpers.bash#L548),
         #  in test file tests/[bud.bats, line 2284](https://github.com/containers/podman/blob/401a1ecd8ecd51efde63fc36d6837df307f173f7/test/system/bud.bats#L2284))
         #   `expect_output "fedora" "name must be fedora from base image"' failed
         # /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5/tests /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5
         # # [checking for: registry.fedoraproject.org/fedora-minimal]
         # # [copy docker://registry.fedoraproject.org/fedora-minimal dir:/var/tmp/buildah-image-cache.3740/registry.fedoraproject.org-fedora-minimal-]
         # Getting image source signatures
         # Copying blob sha256:f8af2181b761a975e3199969f7c3e60950837cdae4d7fb861fc40bacc53c566f
         # Copying config sha256:0eb7b1a2db1ea0de4fb13b46eeabe19e6e3ccc30be8164b5c46e8a043c428e0f
         # Writing manifest to image destination
         # # [copy dir:/var/tmp/buildah-image-cache.3740/registry.fedoraproject.org-fedora-minimal- containers-storage:registry.fedoraproject.org/fedora-minimal]
         # Getting image source signatures
         # Copying blob sha256:f8af2181b761a975e3199969f7c3e60950837cdae4d7fb861fc40bacc53c566f
         # Copying config sha256:0eb7b1a2db1ea0de4fb13b46eeabe19e6e3ccc30be8164b5c46e8a043c428e0f
         # Writing manifest to image destination
         # # buildah --version
         # buildah version 1.36.0-dev (image-spec 1.1.0, runtime-spec 1.1.0)
         # # podman-remote build --force-rm=false --layers=false --signature-policy /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5/tests/policy.json -t exp -f /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5/tests/bud/base-with-labels/Containerfile
         # STEP 1/2: FROM registry.fedoraproject.org/fedora-minimal
         # STEP 2/2: RUN echo hello
         # hello
         # COMMIT exp
         # Getting image source signatures
         # Copying blob sha256:5bb445179d0077f481b41ef6bf437894ed7ed34f48f475c496daaa08cf640260
         # Copying blob sha256:1b9e48ebc05e985b80c66aaa9cba990df285deac3a0ea3a874f85d1a54cc1c01
         # Copying config sha256:1e53504e3490785ef0a01ce52855ac4f2f35ea3b741a21b6c1fbf3b569613e6c
         # Writing manifest to image destination
         # --> 1e53504e3490
         # Successfully tagged localhost/exp:latest
         # 1e53504e3490785ef0a01ce52855ac4f2f35ea3b741a21b6c1fbf3b569613e6c
         # # buildah inspect --format {{ index .Docker.Config.Labels "license"}} exp
         # MIT
         # # buildah inspect --format {{ index .Docker.Config.Labels "name"}} exp
         # fedora-minimal
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #|     FAIL: name must be fedora from base image
         # #| expected: 'fedora'
         # #|   actual: 'fedora-minimal'
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         # teardown: stopping podman server 49943
         # /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5

[...]
  Error running buildah bud tests under podman.
         
         Is it possible that your PR breaks podman build in some way? Please
         review the test failure and double-check your changes.
         
         
         Please see test/buildah-bud/README.md for advice

If a container unit starts on boot with a dependency on `default.target`
the image unit may start too soon, before network is ready. This cause
the unit to fail to pull the image.
- Add a dependency on `network-online.target` to make sure image pulls
don't fail.
See containers#21873

- Document the hardcoded dependency on `network-online.target` for images unit
and explain how it can be overriden if necessary.

- tests/e2e/quadlet: Add `assert-last-key-regex`

Required to test the `After=` override in [Unit] section
See containers#22057 (comment)

- quadlet/unitfile: add a prepenUnitLine method

Requirements on networks should be inserted at the top of the
section so the user can override them.

Signed-off-by: jbtrystram <[email protected]>
@jbtrystram jbtrystram force-pushed the quadlet-image-network branch from 401a1ec to ad1d3f8 Compare May 22, 2024 11:47
Copy link
Contributor

openshift-ci bot commented May 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jbtrystram, ygalblum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jbtrystram
Copy link
Contributor Author

jbtrystram commented May 22, 2024

I don't think this failure have anything to do with my PR :

# podman run --name o8zHCMmnW1 --health-cmd=touch /terminate --sdnotify=healthy quay.io/libpod/testimage:20240123 sh -c while test \! -e /terminate; do sleep 0.1; done; echo finished
<+574ms> # finished
         # Error: container is stopped
<+004ms> # [ rc=126 (** EXPECTED 0 **) ]
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: exit code is 126; expected 0
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

https://api.cirrus-ci.com/v1/artifact/task/5162052191256576/html/sys-podman-fedora-40-aarch64-root-host-sqlite.log.html

@ygalblum
Copy link
Contributor

ygalblum commented May 22, 2024

@containers/podman-maintainers PTAL

@mheon
Copy link
Member

mheon commented May 22, 2024

Known flake, restarted

@jbtrystram
Copy link
Contributor Author

everything passed ! Can I get a LGTM ?

@mheon
Copy link
Member

mheon commented May 22, 2024

LGTM. This is probably good for inclusion in 5.1

@rhatdan
Copy link
Member

rhatdan commented May 22, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 22, 2024
@rhatdan rhatdan added 5.1 and removed lgtm Indicates that a PR is ready to be merged. labels May 22, 2024
@ygalblum
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 23, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 36152ee into containers:main May 23, 2024
89 checks passed
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 22, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Aug 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
5.1 approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. ok-to-test release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quadlet .image file should wait for dns name resolution
5 participants