Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine start: qemu: wait for SSH readiness #19210

Merged
merged 1 commit into from
Jul 13, 2023

Conversation

vrothberg
Copy link
Member

During the exponential backoff waiting for the machine to be fully up and running, also make sure that SSH is ready. The systemd dependencies of the ready.service include the sshd.service among others but that is not enough.

Other CoreOS users reported the same issue on IRC, so I feel fairly confident to use the pragmatic approach of making sure SSH works on the client side. #17403 is quite old and there are other pressing machine issues that need attention.

Fixes: #17403

Does this PR introduce a user-facing change?

Fix a bug of flaky podman-machine-start using QEMU.

@vrothberg vrothberg marked this pull request as ready for review July 12, 2023 13:24
@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jul 12, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2023
@vrothberg
Copy link
Member Author

@ashley-cui @baude @benoitf PTAL

@benoitf
Copy link
Contributor

benoitf commented Jul 12, 2023

I'm on PTO. Will try next week 🎉

Copy link
Member

@ashley-cui ashley-cui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ran this in a loop and haven't seen errors so far. Thanks!

pkg/machine/qemu/machine.go Outdated Show resolved Hide resolved
@rhatdan
Copy link
Member

rhatdan commented Jul 12, 2023

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2023
@vrothberg
Copy link
Member Author

Will fix tomorrow.

@ashley-cui ashley-cui added the 4.6 label Jul 12, 2023
@ashley-cui
Copy link
Member

Commenting here so I remember when backporting tomorrow: when this is ready, this probably should be backported with #19116 so it applies cleanly.

@TomSweeneyRedHat
Copy link
Member

Lots of nasty red tests @vrothberg Neat concept for the change though.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 13, 2023
@vrothberg
Copy link
Member Author

@deboer-tim FYI

During the exponential backoff waiting for the machine to be fully up
and running, also make sure that SSH is ready.  The systemd dependencies
of the ready.service include the sshd.service among others but that is
not enough.

Other CoreOS users reported the same issue on IRC, so I feel fairly
confident to use the pragmatic approach of making sure SSH works on the
client side.  containers#17403 is quite old and there are other pressing machine
issues that need attention.

[NO NEW TESTS NEEDED]

Fixes: containers#17403
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

Ready to go

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vrothberg
Copy link
Member Author

LGTM, ran this in a loop and haven't seen errors so far. Thanks!

Same here. Had hundreds of start+stop iterations and no flake 🥳

@ashley-cui
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 13, 2023
@ashley-cui
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 13, 2023
@openshift-merge-robot openshift-merge-robot merged commit 561062d into containers:main Jul 13, 2023
@vrothberg vrothberg deleted the fix-17403 branch July 13, 2023 13:43
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 12, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Podman machine fails to start with exit status 255 on Mac
7 participants