Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 ignition: start kubeadm after network.target #8772

Conversation

ader1990
Copy link
Contributor

@ader1990 ader1990 commented May 31, 2023

In certain baremetal environment, where there are multiple connected and/or disconnected network ports, the network target is reached more slowly, and the kubeadm.service might fail because it does not have the proper pre-kubeadm commands correctly done (like a ctr image pull) or it cannot connect to other k8s nodes.

Also, kubeadm.service and kubeadm.sh relies to on containerd to be working at that moment too. Please let me know if I need to remove the After=containerd.service part and maybe add this check in another place?

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 31, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @ader1990. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ader1990 ader1990 changed the title ignition: start kubeadm after network.target and containerd.service 🐛 🐛 ignition: start kubeadm after network.target and containerd.service May 31, 2023
Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

Will leave review of this to people properly familiar with the ignition implementation.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 31, 2023
@killianmuldoon
Copy link
Contributor

@dongsupark
@invidian
@johananl

From the ignition OWNERS file

@ader1990
Copy link
Contributor Author

/ok-to-test

Will leave review of this to people properly familiar with the ignition implementation.

Thank you. I think the containerd might not be required per se, but the kubeadm.sh relies on containerd to be fully working before doing the kubeadm init, so I added that part too.

Thank you.

@ader1990 ader1990 force-pushed the fix_ignition_kubeadm_transitory_failures branch from 5404fc4 to 6d3c06b Compare May 31, 2023 12:52
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 31, 2023
@@ -96,6 +96,8 @@ systemd:
Description=kubeadm
# Run only once. After successful run, this file is moved to /tmp/.
ConditionPathExists=/etc/kubeadm.yml
After=network.target
After=containerd.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that containerd will always be used as a CRI, which I don't think is true. I'm pretty sure cri-o can also be used, so this would be a breaking change for such users.

I think this modification should be done via cluster template, as it allows adding additional unit overrides easily.

@@ -96,6 +96,8 @@ systemd:
Description=kubeadm
# Run only once. After successful run, this file is moved to /tmp/.
ConditionPathExists=/etc/kubeadm.yml
After=network.target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence with this condition, since it has been working stable in all environments I have tested it, but it seem generic enough that we could add it, as kubeadm is indeed likely to depend on networking in general.

Generally, such modifications should be applied at the cluster template level, this is why we allow adding extra CLC snippets there, but for the very generic and high level things, we may make an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving aside the containerd part, the network target is a must have (at least in baremetal environments and probably in lazy virtual environment), otherwise the control plane or the worker node are very likely to fail when connecting to each other.

I will remove the containerd part.

@invidian
Copy link
Member

Ah, I forgot in the review comment, thanks for giving Ignition feature a try and opening the PR @ader1990!

@ader1990 ader1990 force-pushed the fix_ignition_kubeadm_transitory_failures branch from 6d3c06b to 015701a Compare May 31, 2023 12:58
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 31, 2023
@ader1990 ader1990 changed the title 🐛 ignition: start kubeadm after network.target and containerd.service 🐛 ignition: start kubeadm after network.target May 31, 2023
@ader1990
Copy link
Contributor Author

Ah, I forgot in the review comment, thanks for giving Ignition feature a try and opening the PR @ader1990!

This PR is in the context of a larger effort to have automated Baremetal ARM64 deployments of K8S clusters using CAPI and Flatcar :)

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Commit title could be adjusted to match the content, but it's a nit.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: eb0785d90123a175894892e4f125037406c100ea

In certain baremetal environments, where there are multiple connected
and/or disconnected network ports, the network target is reached more
slowly, and the kubeadm.service might fail because it does not have the
proper pre-kubeadm commands correctly done (like a ctr image pull) or it
cannot connect to other k8s nodes.
@ader1990 ader1990 force-pushed the fix_ignition_kubeadm_transitory_failures branch from 015701a to 7bb4cde Compare May 31, 2023 13:04
@ader1990
Copy link
Contributor Author

/lgtm

Commit title could be adjusted to match the content, but it's a nit.

Updated the commit message to reflect the content.

Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: killianmuldoon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2023
@killianmuldoon
Copy link
Contributor

/cherry-pick release-1.4

@k8s-infra-cherrypick-robot

@killianmuldoon: once the present PR merges, I will cherry-pick it on top of release-1.4 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@killianmuldoon
Copy link
Contributor

/cherry-pick release-1.3

@k8s-infra-cherrypick-robot

@killianmuldoon: once the present PR merges, I will cherry-pick it on top of release-1.3 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot merged commit 9fe11dc into kubernetes-sigs:main Jun 6, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.5 milestone Jun 6, 2023
@k8s-infra-cherrypick-robot

@killianmuldoon: new pull request created: #8803

In response to this:

/cherry-pick release-1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@killianmuldoon: new pull request created: #8804

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@johannesfrey
Copy link
Contributor

/area provider/bootstrap-kubeadm

@k8s-ci-robot k8s-ci-robot added the area/provider/bootstrap-kubeadm Issues or PRs related to CAPBK label Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/bootstrap-kubeadm Issues or PRs related to CAPBK cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants