-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multipart mime support and refactor #21
Add multipart mime support and refactor #21
Conversation
6e27d1b
to
5c9797d
Compare
This change adds multipart mime userdata support and refactors the code to allow coreos-cloudinit to run multiple userdata parts as separate steps. Signed-off-by: Gabriel Adrian Samfira <[email protected]>
5c9797d
to
390460b
Compare
I am not 100% sure, but I think we noticed that cloud-init runs the scripts ordered by filename: |
Will check it out. Sorting by file name should be an easy change. Will look at the cloud-init code and align with existing expectations users might have. |
Seems that cloud-init walks the multipart message and calls the supplied callback: https://github.com/canonical/cloud-init/blob/main/cloudinit/stages.py#L653 defined here: https://github.com/canonical/cloud-init/blob/main/cloudinit/handlers/__init__.py#L257-L277 The mime message does not seem to be sorted in any way. It kind of makes sense. This way you can add scripts/cloud-configs in the order you want them to be executed by simply appending to an array before you serialize them in a MIME multipart message. We can change this later if it turns out I did not understand the code correctly, and they are sorted by filename. |
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Thank you, let's also test this with the regular test suite. For that we need to create a scripts PR that uses your repo and the branch's last commit ID in |
Co-authored-by: Krzesimir Nowak <[email protected]>
A potential kola integration test would have to pretty much run a VM with each of the user-data combinations in the Thanks for the review folks! |
* Use textproto to try and read multipart headers * No need for generics in parseMimeHeader * Remove quadratic Signed-off-by: Gabriel Adrian Samfira <[email protected]>
…ra/coreos-cloudinit into add-multipart-mime-support
Changes made. Will update the scripts PR to include some integration tests. |
Proposed flatcar/mantle#437, but needs approval before workflows run. |
Co-authored-by: Krzesimir Nowak <[email protected]>
Co-authored-by: Krzesimir Nowak <[email protected]>
Co-authored-by: Krzesimir Nowak <[email protected]> Signed-off-by: Gabriel Adrian Samfira <[email protected]>
96492d1
to
52d63b4
Compare
hostname := determineHostname(metadata, udata) | ||
if err := initialize.ApplyHostname(hostname); err != nil { | ||
log.Printf("Failed to set hostname: %v", err) | ||
mustStop = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now overwrites any static hostname with the meta-data hostname. Can we remove these lines again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was kind of the point of this. If coreos-cloudinit is used, it is responsible for setting the hostname. The hostname is fetched from userdata or meta-data and the short form hostname is set. Userdata has precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, when Ignition ran and coreos-cloudinit gets triggered through the config drive mechanism, this suddenly overwrites the hostname Ignition set up in /etc/hostname
.
We could try to skip execution of coreos-cloudinit when Ignition ran but still I wonder if this here wouldn't also cause problems when coreos-cloudinit or some custom image setup was writing to /etc/hostname
and now this here overwrites it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we should use coreos-cloudinit in scenarios where it's needed (like OpenStack or wherever we need multipart mime/cloud-config). Otherwise it should be disabled.
If, however it's enabled, we should assume it will overwrite some things set by afterburn.
If custom image setup is needed, that should either be done with coreos-cloudinit via userdata, or it should run after coreos-cloudinit.
Otherwise there is no sane way to have coreos-cloudinit run and be useful. Perhaps this is something that needs to be documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the process terminated with Detected an Ignition config. Exiting...
but that's maybe something we better enforce from the service unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it should probably not run if the userdata is ignition. Is there some other trigger that enables it, besides non-ignition userdata?
If yes, we should probably add flags to disable various bits of it, like setting the hostname and SSH keys (for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think flags are a good idea, we also have this for afterburn and it would allow to, e.g., opt out of metadata if it's covered by afterburn for the platform. We should also tweak the units to not have it run at all on certain platforms: e.g., on Digital Ocean it can run twice, once through the regular oem-cloudinit service and once through the configdrive, and it makes sense to disable it for the config drive (this is the case we ran into).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look into adding those flags as soon as I can.
os.Exit(1) | ||
} | ||
mergedKeys := mergeSSHKeysFromSources(metadata, udata) | ||
if err := initialize.ApplyCoreUserSSHKeys(mergedKeys, env); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also a new addition to the code path? I noticed that this conflicts with afterburn writing the keys as well, and this race could maybe lead to a broken setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afterburn runs in initrd. This runs during system service startup. At most, I think this can add duplicate keys, but that should not break anything. This is not new, just moved around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[email protected]
is running at the same time but in this execution log here it was faster than cloudinit:
1377 │ Jun 20 08:34:24.519522 coreos-cloudinit[1171]: 2023/06/20 08:34:24 Checking availability of "cloud-drive"
1378 │ Jun 20 08:34:24.519522 coreos-cloudinit[1171]: 2023/06/20 08:34:24 Fetching meta-data from datasource of type "cloud-drive"
1379 │ Jun 20 08:34:24.519522 coreos-cloudinit[1171]: 2023/06/20 08:34:24 Attempting to read from "/media/configdrive/openstack/latest/meta_data.json"
1380 │ Jun 20 08:34:24.519522 coreos-cloudinit[1171]: 2023/06/20 08:34:24 Fetching user-data from datasource of type "cloud-drive"
1381 │ Jun 20 08:34:24.519522 coreos-cloudinit[1171]: 2023/06/20 08:34:24 Attempting to read from "/media/configdrive/openstack/latest/user_data"
1382 │ Jun 20 08:34:24.524397 update-ssh-keys[1183]: Updated "/home/core/.ssh/authorized_keys"
1383 │ Jun 20 08:34:24.521093 systemd[1]: Finished [email protected] - Flatcar Metadata Agent (SSH Keys).
The refactoring in flatcar#21 caused hostnames to be set unconditionally compared to the old behavior of only setting the hostname if it not empty. When running coreos-cloudinit with datasources that do not provide metadata such as the `file` datasource, the refactored code caused the hostname to always be reset to `localhost`. This leads to various problems like preventing k8s nodes from joining their cluster. This change restores the old behavior by not applying empty hostnames. Fixes flatcar/Flatcar#1262
Refactor user-data handling
This change adds multipart mime userdata support and refactors the code to allow coreos-cloudinit to run multiple userdata parts as separate steps.
The new tests that were added use the testify, which also pulls in yaml.v3. This makes up the bulk of the lines added.
In a future PR, it may be worth replacing the now archived yaml library that coreos-cloudinit uses in favor of a library that is actively maintained/
We try to set the hostname and import ssh keys before anything else happens. If we fail later on and we manage to import SSH keys, we can at least debug what has happened.
How to use
The old behavior is preserved. New support is added for multipart mime user-data, which means we may get multiple different parts that
coreos-cloudinit
will now run in the order they are defined in the multipart user-data.Userdata hostname precedes the metadata one, but if multiple
#cloud-config
parts are defined with a hostname set, only the first one is returned.Script and cloud-config parts are run. If there is a valid ignition part, we log the event and do nothing. Any other user-data part type that we don't support is labeled as "unknown" and logged.
Testing done
Added new tests for the new user data parser. Built image using it and deployed virtual machines with all types of supported userdata.
Test Multipart Mime user-data
Userdata:
Result:
Test cloud-config
Userdata:
Result:
Test script userdata
Userdata:
Result:
Test kops deployment
A k8s deployment was tried, with
additionalUserData
set. A full config of theInstanceGroup
bellow:Result:
A multipart mime userdata was created for the controller. The k8s cluster came up successfully and
/tmp/coreos-cloudinit_test.txt
was created on the controller with the contents:42
, as expected.changelog/
directory (user-facing change, bug fix, security fix, update)/boot
and/usr
size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.Related: flatcar/scripts#823