Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use retry logic from containers/common #5345

Merged
merged 2 commits into from
Feb 28, 2024

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Feb 20, 2024

What type of PR is this?

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake
/kind other

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Default values for --retry and --retry-delay can be set in containers.conf

@rhatdan
Copy link
Member Author

rhatdan commented Feb 25, 2024

@edsantiago Did the new crun make it into this VM release?

@edsantiago
Copy link
Member

Post-2024-01-02 VMs are in podman only, due to the rawhide hang. I am building new VMs right this moment, hoping to get an rc6 kernel. It will take ~2h to find out.

You can, if you wish, bump up to c20240222t143004z-f39f38d13 here for testing purposes. I don't recommend merging with those VMs because CI is likely to need many, many reruns.

@rhatdan
Copy link
Member Author

rhatdan commented Feb 26, 2024

Already bumped to c20240222t143004z-f39f38d13 and still seeing crun errors?

@edsantiago
Copy link
Member

Oh, sorry - I didn't actually look at your diffs. My bad.

The error I see is

[+0050s]   Error: unable to parse value provided "" as --retry-delay: time: invalid duration ""

No wait I see crun error in conformance. It looks like conformance is using debian??????? Debian and crun??? That seems weird. And yes, crun is 1.14.3 on all fedoras but 1.14.1 in debian. Do you know why conformance is using debian + crun (instead of runc)?

@rhatdan
Copy link
Member Author

rhatdan commented Feb 26, 2024

No idea, that seems wrong. I will look at the retry-delay one.

@edsantiago
Copy link
Member

I don't know what to do here. Best I can tell, crun-1.14.3 is not even built yet for debian. And I can't figure out where the "use crun in debian" logic is coming from.

@edsantiago
Copy link
Member

edsantiago commented Feb 26, 2024

This is weird, in setup:

[+0003s] dpkg: considering removing runc in favour of containerd.io ...
[+0003s] dpkg: yes, will remove runc in favour of containerd.io

Next step is to see if this happens in older CI runs with older VMs. I will do so.

UPDATE: yes, this is happening right now. Next question: when did this (removing runc) start happening? That will be much harder to answer.

UPDATE 2: it happened December 22 2023, which is before January 2 2024, therefore it is not something new in c20240201t143038z-f39f38d13. We do not have CI logs beyond 60 days so I'm calling this a dead end.

Situation:

  • buildah conformance tests have been running on debian + crun for at least two months
    • they have been running just fine
  • As of this writing (2024-02-26), debian has a broken crun. There is no fixed crun anywhere in the pipeline AFAICT.

Recommendation: find a Debian maintainer who will build crun-1.14.3. My only clue here is something called Salsa, which suggests that Faidon Liambotis at Debian might be a good person to contact.

UPDATE 3: I've sent email to Liambotis.

@rhatdan rhatdan force-pushed the retry branch 2 times, most recently from 19a7f5a to 7eb109e Compare February 27, 2024 17:58
@rhatdan
Copy link
Member Author

rhatdan commented Feb 27, 2024

@edsantiago I turned off the update for runtime-spec until crun gets updated for Debian. This will allow us to merge this and get buildah updated in podman.

@rhatdan
Copy link
Member Author

rhatdan commented Feb 27, 2024

@containers/podman-maintainers PTAL
@nalind @flouthoc @umohnani8 @giuseppe @vrothberg @ashley-cui @TomSweeneyRedHat @mheon @baude @Luap99 PTAL

@rhatdan
Copy link
Member Author

rhatdan commented Feb 27, 2024

@lsm5 PTAL, once this gets in your PR can probably get in as well.

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Feb 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and tests LGTM but the replace caught my eyes.

@@ -155,3 +155,5 @@ require (
k8s.io/klog v1.0.0 // indirect
tags.cncf.io/container-device-interface v0.6.2 // indirect
)

replace github.com/opencontainers/runtime-spec => github.com/opencontainers/runtime-spec v1.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the replace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crun on debian is not updated to a version that will work, so in order to move forward we are not advancing the runtime-spec. As soon as debian moves forward this can be removed.

This will not effect the effect the eventual podman build PR

@TomSweeneyRedHat
Copy link
Member

Same question as @vrothberg about the replace, that was unexpected.
LGTM otherwise. The branch is out of date, I'll update it now.

@rhatdan rhatdan added the lgtm label Feb 28, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit d80ec96 into containers:main Feb 28, 2024
37 checks passed
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators May 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants