Improving the management of openSUSE MicroOS and similar #8828

agraul · 2024-05-28T12:22:33Z

agraul
May 28, 2024
Collaborator

Introduction

MicroOS, or the downstream SLE Micro, is a different operating system than openSUSE Leap / SLE. It is based on the same tools and packages, with few additions on top. These additions are few, but completely change the operating model.

The main idea is: All changes go into a new btrfs snapshot. This new snapshot is “pending” until a reboot activates it.

Until now, we tried to hide the differences in Uyuni and relied on Salt to do the right thing. This strategy was easy for us to use, but it did not work well. We need to change the approach we take with transactional systems.

Improvements

Differentiate between OS-unchanging and OS-altering operations

Instead of using the transactional_update executor, Uyuni should call either state.apply or transactional_update.apply. Uyuni is in full control of which states are applied in a new snapshot and which states are not. The SLS file is the smallest unit Salt can handle, there is no way to apply only parts of an SLS file inside a snapshot. Therefore, we might need to split SLS files that currently mix OS-unchanging and OS-altering operations.

Uncategorized operations

vms.*

OS-unchanging operations

These operations do not alter the system, i.e. they don’t belong in a (new) snapshot. Uyuni applies them with state.apply for two reasons: structured output and no concurrency (queue=True).

ansible.runplaybook
cocoattest TODO: extract installation to OS-altering state
hardware.profileupdate TODO: extract dmidecode installation to OS-altering state
images.*
packages.profileupdate NOTE: we need to decide if we want to list packages on the active snapshot or the pending snapshot
srvmonitoring.status
util.systeminfo{,_full}

OS-altering operations

These operations alter the operating system itself, i.e. they belong in a (new) snapshot. Uyuni applies them with transactional_update.apply, unless otherwise noted.

appstreams.configure
certs
channels
cleanup_minion
configuration.deploy_files NOTE: reconfiguring a service through /etc is special as there is an overlayfs. /etc/ can also be targeted with state.apply
distupgrade
packages{,.patch*,.pkg*}
services.docker
services.kiwi-image-server
services.reportdb-user
services.salt-minion REVIEW: Is all of this still needed? NOTE: file.managed in /etc can be applied with state.apply
srvmonitoring.{enable,disable}
switch_to_bundle
util.{mgr_disable_fqdns_grain,mgr_mine_config_clean_up}: NOTE: configures in /etc, restarts a service (broken)
util.rotate_saltssh_key
util.mgr_start_event_grains NOTE: configures in /etc, can be applied with state.apply
util.mgr_switch_to_venv_minion
util.sync* NOTE: can be applied with state.apply, these sync Salt modules to /var/cache/ on the Minion
uptodate
update-salt

Either OS-unchanging or OS-altering

These operations can be either OS-unchanging or OS-altering because they are to generic to know ahead of time. Users need to have control over the way these Salt states are applied.

custom
custom_groups
custom_org
recurring
remotecommands
scap NOTE: remediate=True is likely OS-altering

Automatically reboot during bootstrapping

When bootstrapping a new system, we rely on information present on the client system know what kind of system it is. This includes finding out if the new system is a transactional system. Bootstrapping happens with Salt SSH and state.apply. The bootstrap SLS file contains logic to install our Salt Minion package correctly on both traditional and transactional systems.

The bootstrap SLS file installs the Salt Minion package into the next snapshot. We need to reboot to activate it.
Ideas:

Uyuni works as a Salt Reactor to the job/done event for the state.apply. Uyuni would need to look at the returned state ids to differentiate between a bootstrap state than ran on a transactional system, e.g. copy_transactional_conf_file_to_etc. This complects implementation details and makes the system hard to change.
Bootstrap SLS uses cmd.run bg=True to trigger the reboot in the future. This leaves the door open for race conditions or long delays.
Custom bootstrap module instead of a state file. As a first implementation, this module can delegate the applying certs and bootstrap to state.apply / transactional_update.apply activate_transaction=True.

Enable Salt SSH contact method

Currently we rely on calling transactional-update run salt-call, which needs a Salt codebase available in the new snapshot. With Salt SSH, there is no installed Salt codebase on the system, and therefore missing from the new snapshot. Originally, this was not the case and was introduced to fix another bug (PR). We should revisit this decision and find a way that does not depend on anything within a transaction.

Review approach of rebooting systems

One-off reboots can be triggered by Uyuni with state.single module.run transactional_update.reboot queue=True.

Current approaches

uptodate.sls

This state is not applied automatically. Users can set up “recurring” states with this SLS included, which will call salt.apply (with the transactional_update executor redirecting this to transactional_update.apply).
```
mgr_reboot_if_needed:
  cmd.run:
    - name: shutdown -r +5
```

This state is a good candidate for transaction_update.apply activate_transaction=true

Java Class SaltSystemReboot

Results in a state like:
```
<id>:
  mgrcompat.module_run:
    - name: system.reboot
    - at_time: ...
```
This chunk is not added to SLS files that computed for transactional systems. For transactional systems, the "Action Chains" approach, is used.

REVIEW: Used in code path when user clicks on “schedule reboot” in WebUI.
Action Chains

Uyuni’s Action Chains implementation uses a custom Salt execution module that checks metadata we pass from the Java code base. This execution module calls transactional_update.reboot to trigger a reboot.
Custom SLS

Users might specify states that use e.g. cmd.run or system.reboot in their custom Salt states. This only works correctly when it’s the last state, otherwise the rest of the SLS file is not applied.

Requesting a reboot after applying an SLS file is a WebUI/API feature that’s orthogonal to the specific SLS file. Instead of keeping a "reboot at the end"-request within SLS files, Uyuni should expose a checkbox / API argument and set activate_transaction accordingly on the transactional_update.apply call for transactional systems. On traditional Linux operating systems, Uyuni can insert a reboot state with SaltSystemReboot.

Make state functions available for transactional systems

service.enabled: currently needs dbus, we need a way that does not require dbus for enabling the service
service.disabled currently needs dbus, we need a way that does not require dbus for enabling the service

Differentiate active and pending state in Uyuni

Both the WebUI and the API should provide information about the currently active state/snapshot of a transactional system, and of the pending state of the same system. This is important to show accurate information without requiring a reboot to ensure there is no pending state.

For example, we currently can’t answer this simple question accurately: Do I need to update package foo? If we look at the active state, the answer might be yes, install the update. At the same time, looking at the pending state, the answer might be no, just reboot. Both answers are right, and both are needed to obtain the full picture.

chargio · 2024-05-28T15:35:25Z

chargio
May 28, 2024

Does the new snapshot gets activated by default? Or is there an option to enable it afterwards?
I am thinking on the possibility to preload a change before the change window

4 replies

thkukuk May 29, 2024

New snapshots gets activated with the next reboot, except the snapshot got marked as "broken" or somebody made a rollback before or creates an even newer snapshot.
Preloading a change before the maintenance window and reboot the machine during the maintenance window is the default setup for transactional-update.

agraul May 29, 2024
Collaborator Author

Yes, the new snapshot (created via transactional-update run ...) is marked as the default snapshot. On reboot, this new snapshot will be active automatically.

MicroOS / SLE Micro managed by Uyuni don't use rebootmgr by default, we set these systems up to reboot immediately so that Uyuni can control the timing. That's currently done at the time of bootstrapping (i.e. onboarding to Uyuni) and only if the default config was not changed. If users want to use e.g. rebootmgr, they can do so by setting the "reboot method" for transactional-update.

What you describe with preloading changes is possible and I would say fits the mental model of such systems nicely. It's already possible in Uyuni today, with the caveat that we don't have clear views on the diff between currently running and pending.

agracey May 29, 2024

A thing to be aware of that got a customer of mine recently is that (by default) transactional update builds the new snapshot based on the currently running system instead of the latest snapshot available. This means that running multiple operations across different transactional updates will overwrite each other unless you know to add the right flags.

thkukuk May 29, 2024

That's meanwhile a FAQ: if you base new snapshots on the latest one, you could base it on a broken one. There is no way to find out that the latest snapshot is a good or broken one. So basing a new snapshot on a broken one would be really bad.

rjmateus · 2024-05-29T11:07:46Z

rjmateus
May 29, 2024
Collaborator

@agraul If I got you right your idea is to differentiate between os changing states stat needs a reboot and would be applied with transactional_update.apply by uyuni and others that don't need to run on a transition and that will be executed with state.apply.
The ones that mixed stuff would need to be refactored and split.
Uyuni would be automatically defined which should be the right command to call, right? It would be transparent for the users.

Regarding the bootstrap, would be possible to handle the reboot issue on the preFlight script when we onboard from the UI, and on the bootstrap script we provide for on-boarding (maybe with a delayed reboot or send some special flag data in the return)?

3 replies

agraul May 29, 2024
Collaborator Author

Uyuni would be automatically defined which should be the right command to call, right? It would be transparent for the users.

Yes, that's the general idea. For custom states I think we need to expose the choice to the user.

Regarding the bootstrap, would be possible to handle the reboot issue on the preFlight script when we onboard from the UI, and on the bootstrap script we provide for on-boarding (maybe with a delayed reboot or send some special flag data in the return)?

The preflight script is a shell script that is executed before the state.apply execution function is executed. I can't think of a way for it to help with the issue that triggering a reboot from within an SLS file interferes with returning data (execution happens via Salt SSH).

I don't know systemd very well, maybe there is a way to have the Salt SSH execution running with in a transient service that inhibits reboots until it's done. IIUC, we don't have the same problem with Salt Minions because systemd waits for it to finish execution when we ask systemd to reboot.

The bootstrap script could trigger a reboot at the end, and if I read the script correctly we already have an option for that:

[...]
if [ -n "$SNAPSHOT_ID" ]; then
    call_tukit "systemctl enable '$MINION_SERVICE'"
    tukit -q close $SNAPSHOT_ID
    if [ "$SCHEDULE_REBOOT_AFTER_TRANSACTION" -eq 1 ]; then
        transactional-update reboot
    else
       echo "** Reboot system to apply changes"
    fi
[...]

rjmateus May 29, 2024
Collaborator

hum good point on the preflight only prepares the environment to apply the state, the bootstrap is done by applying the "certs" and "bootstrap" states.
Would be possible or make sense to have a 3th salt state, that checks if the system is transitional and schedules a reboot?

agraul Jun 3, 2024
Collaborator Author

I'm not sure if a 3rd SLS helps here. If we call it together (state.apply certs,bootstrap,reboot), we still have the problem of returning the data before executing the reboot.

mbrookhuis · 2024-08-14T06:10:15Z

mbrookhuis
Aug 14, 2024
Collaborator

We are facing even more basic problems of SALT states. We are seeing massive problems with e.g. file.managed and other options in file.
This makes using salt on sle-micro very difficult.

Some more details.
We are writing files to e.g. /var/lib/rancher/*, /usr/local/bin, /opt. /opt is also defined in tukit.conf to be ignored and /var is ignored by default.
When we execute the following:

> a4_pod_k3s_manifests_ad_proxy_yaml:
>   file.managed:
>     - name: /var/lib/rancher/k3s/server/manifests/ad-proxy.yaml
>     - user: root
>     - group: root
>     - mkdirs: True
>     - mode: 644
>     - require:
>        - file: a4_pod_k3s_create_manifests_dir
>     - contents: |
>         apiVersion: v1
>         kind: Namespace
>         metadata:
>           name: ad-proxy
>         ---
>         apiVersion: v1
>         data:
> {%- if salt.grains.get('wildcardcert','') %}
> {%- set servercrt = salt['pillar.get']('cert_encoded:server_cert') %}
> {%- set serverkey = salt['pillar.get']('cert_encoded:server_key') %}
> {%- else %}
> {%- set servercrt = salt['pillar.get']('cert_encoded:stls_server_cert') %}
> {%- set serverkey = salt['pillar.get']('cert_encoded:stls_server_key') %}
> {%- endif %}
>           tls.crt: {{ servercrt }}
>           tls.key: {{ serverkey }}
>         kind: Secret
>         metadata:
>           name: ad-proxy-crt
>           namespace: ad-proxy
>         type: Opaque

we would expected that the file is created as salt-minion is reporting success. But if you then check the directory, it is empty:
cat /var/lib/rancher/k3s/server/manifests/ad-proxy.yaml cat: /var/lib/rancher/k3s/server/manifests/ad-proxy.yaml: No such file or directory

If you run the state again, it reports that the file is present and having the correct content.

Actions performed to /etc are working. These files will be present.

On SLE-Micro 5.2, salt-minion created a file called /etc/salt/minion.d/transactional_update.conf. Making this file empty solved the above error, but you will loose some functionality with updating. This is not file is not present anymore in SLE-Micro 5.5 with venv-salt-minion and salt-minion.

2 replies

mbrookhuis Aug 14, 2024
Collaborator

A workaround is adding the directories where files are written, to /etc/tukit.conf.

thkukuk Aug 14, 2024

From Improvements above:

Instead of using the transactional_update executor, Uyuni should call either state.apply or transactional_update.apply.

This is what you need: a way to tell Uyuni if for your state state.apply or transactional_update.apply should be used.

Between, the file is present, it's not missing. Please look in your log files about files which are shadowed by other subvolumes or similar mounts.

mcalmer · 2024-09-03T16:23:45Z

mcalmer
Sep 3, 2024
Maintainer

I wonder what will happen with custom state in a highstate and for states defined by the user in /etc/salt/top.sls and imported via gitfs?
How to differentiate between state.apply and transactional_update.apply for these states?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the management of openSUSE MicroOS and similar #8828

{{title}}

Replies: 4 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Improving the management of openSUSE MicroOS and similar #8828

agraul May 28, 2024 Collaborator

Introduction

Improvements

Differentiate between OS-unchanging and OS-altering operations

Uncategorized operations

OS-unchanging operations

OS-altering operations

Either OS-unchanging or OS-altering

Automatically reboot during bootstrapping

Enable Salt SSH contact method

Review approach of rebooting systems

Current approaches

Make state functions available for transactional systems

Differentiate active and pending state in Uyuni

Replies: 4 comments · 9 replies

chargio May 28, 2024

thkukuk May 29, 2024

agraul May 29, 2024 Collaborator Author

agracey May 29, 2024

thkukuk May 29, 2024

rjmateus May 29, 2024 Collaborator

agraul May 29, 2024 Collaborator Author

rjmateus May 29, 2024 Collaborator

agraul Jun 3, 2024 Collaborator Author

mbrookhuis Aug 14, 2024 Collaborator

mbrookhuis Aug 14, 2024 Collaborator

thkukuk Aug 14, 2024

mcalmer Sep 3, 2024 Maintainer

agraul
May 28, 2024
Collaborator

Replies: 4 comments 9 replies

chargio
May 28, 2024

agraul May 29, 2024
Collaborator Author

rjmateus
May 29, 2024
Collaborator

agraul May 29, 2024
Collaborator Author

rjmateus May 29, 2024
Collaborator

agraul Jun 3, 2024
Collaborator Author

mbrookhuis
Aug 14, 2024
Collaborator

mbrookhuis Aug 14, 2024
Collaborator

mcalmer
Sep 3, 2024
Maintainer