-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in cloud init #1714
Comments
@timothysc: is cloud init being run b4 services are up? |
comments from meeting: |
Hi @figo / @codenrhoden, I think one of you should take this in the image builder. I believe a drop-in for a cloud-init phase target should be able to set up the required service order to prevent the race condition. |
Just to confirm, is it we want to make sure containerd service running before cloud-init?
|
I don't believe
You'll likely want something like this in a drop-in:
|
I'm not sure that would work, since containerd is likely dependent on the networking target. I'm not sure if there is a way to trigger it to say that it is required before cloud-init starts the final phase. |
Hi @detiber, I think there may be if we change it to this:
The |
Or rather this to be sure:
This should ensure containerd is started between the two phases where cloud-init has brought networking online. This starts things as early as possible for containerd I think. |
There are several cloud-init boot stages we can use, each with a corresponding systemd service we can target. |
Since runcmd is called by So After/Wants=cloud-config.service and Before/WantedBy=cloud-final.service? |
Oh, good idea. |
I hope image itself can ensure order correctness, rather than expecting certain user-data, does this make sense? |
@figo I can foresee cases where someone might want to inject config during bootstrapping, which is what I was referring to. Being able to specify a custom config file for containerd in the bootstrapping config, for example. I would expect that by adding a systemd dropin with the criteria I mentioned about should help support that use case (and we are enforcing the ordering in the image, but allowing user-data defined config for the service). It might be helpful to define a similar dropin for the kubelet as well, that would also add an After/Wants for containerd. |
@detiber @akutz Ansible don't come with support of systemd "After/Want/Wantedby".
Although the systemd containerd.service file comes from containerd tar file, we could replace it with our own version and add new rules to it.
The drawback is: this systemd service configure file suppose to be updated by containerd release, it soon becomes obsolete once we have our own version. |
You don’t need to do that @figo. Just use Ansible to write a drop in file for containerd. A drop in can add/replace settings for an existing systemd unit file. Read more at https://www.freedesktop.org/software/systemd/man/systemd.unit.html and https://wiki.archlinux.org/index.php/systemd#Drop-in_files. For example:
|
Hi @figo, Based on the conversation in #112, I wanted to revise the drop-in example from above. We should have two drop-ins:
|
Issue been fixed with kubernetes-sigs/image-builder#113 /close |
@figo: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/open Found a problem with kubernetes-sigs/image-builder#113
Three options at resolution: Option 1Change the override to be WantedBy cloud-config instead of cloud-final Pros:
Cons:
For reference, on Ubuntu this is cloud_config_modules:
# Emit the cloud config ready event
# this can be used by upstart jobs for 'start on cloud-config'.
- emit_upstart
- snap
- snap_config # DEPRECATED- Drop in version 18.2
- ssh-import-id
- locale
- set-passwords
- grub-dpkg
- apt-pipelining
- apt-configure
- ubuntu-advantage
- ntp
- timezone
- disable-ec2-metadata
- runcmd
- byobu And AL: cloud_config_modules:
- disk_setup
- mounts
- locale
- set-passwords
- yum-configure
- yum-add-repo
- package-update-upgrade-install
- timezone
- disable-ec2-metadata
- runcmd Option 2Move Pros:
Cons:
Option 3Use script-user, which is Pros:
Cons:
I can't see an issue with going with Option 1. |
/reopen |
@randomvariable: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
(Doesn't repro very often) Creation of master node fails with error
What did you expect to happen:
Master node to come up
Anything else you would like to add:
This is the containerd log for
journalctl -u containerd -l
Environment:
kubectl version
):/etc/os-release
):/kind bug
The text was updated successfully, but these errors were encountered: