Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sysext: port AWS OEM to systemd sysext image #1083

Merged
merged 8 commits into from
Sep 26, 2023
Merged

sysext: port AWS OEM to systemd sysext image #1083

merged 8 commits into from
Sep 26, 2023

Conversation

tormath1
Copy link
Contributor

@tormath1 tormath1 commented Aug 24, 2023

In this PR, we port the current AWS OEM to a systemd system extension (sysext) image. It allows us to not rely on the base-ec2.ign configuration file and to remove specific OEM bits from the two related ebuilds: flatcar-eks and amazon-ssm-agent.

Testing done

related to: flatcar/Flatcar#1145

@tormath1 tormath1 self-assigned this Aug 24, 2023
@tormath1 tormath1 temporarily deployed to development August 24, 2023 14:29 — with GitHub Actions Inactive
@github-actions
Copy link

github-actions bot commented Aug 24, 2023

cat > "${rootfs}/usr/lib/systemd/system/setup-oem.service" <<-'EOF'
[Unit]
Description=Setup OEM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this run before amazon-ssm-agent.service?
Also, would symlinks work, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: what would be the benefit of using symlink here? If the user wants to edit /etc/amazon/ssm/amazon-ssm-agent.json for example, he won't be able to do it as /usr/... is read-only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With cp the user edits also get lost. With a symlink we could check whether a custom target is set and then don't touch it and document how that works to opt-out of auto-updates for that file.

@pothos
Copy link
Member

pothos commented Aug 25, 2023

sdk_container/src/third_party/coreos-overlay/coreos-base/oem-ec2-compat/files/base/base-ec2.ign can be deleted and ec2 removed from oem-ec2-compat-0.1.2-r3.ebuild.
We also need the list of old OEM files to clean up for the migration (for the rootfs only /etc/systemd/system/amazon-ssm-agent.service and its enabling symlink?).

EOF

mkdir -p "${rootfs}/usr/lib/systemd/system/multi-user.target.d"
{ echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service [email protected] setup-oem.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ami.conf"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling [email protected] is something we should do in the base image.

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/usr/bin/cp /usr/share/amazon/ssm/amazon-ssm-agent.json /etc/amazon/ssm/amazon-ssm-agent.json.template
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that file is only used by the service unit we could also use BindPaths= to provide it under /etc. @krnowak that could also be an option for the waagent, or?

/etc/eks/bootstrap.sh
)

rm -rf "${to_delete[@]/#/${rootfs}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line, what creates the files under /etc?

Copy link
Contributor Author

@tormath1 tormath1 Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've been confused about the manglefs script - to me it was running on the host so I wanted to clean up the OEM old files from there.

EDIT: Ok, it's done there flatcar/update_engine#24 (comment)

Copy link
Member

@pothos pothos Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of old OEM files should now go to the misc-files package: #1016
We should boot an instance and check the old contents of /oem/ (the list of files for /etc looks good and can be easily seen in the base.ign).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the above can be deleted here, or?

mkdir -p "${rootfs}/usr/lib/systemd/system/amazon-ssm-agent.service.d"
cat > "${rootfs}/usr/lib/systemd/system/amazon-ssm-agent.service.d/10-bindpaths.conf" <<-'EOF'
[Service]
BindPaths=/usr/share/amazon/ssm/:/etc/amazon/ssm/ /usr/share/amazon/eks/boostrap.sh:/etc/eks/bootstrap.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are users expected to be able run the CLI themselves and does it also read from /etc? (In that case we would anyway have to have the symlinks from /etc, or?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually from my understanding, the bootstrap script is executed by user-data (https://kinvolk.io/blog/2021/02/deploying-an-eks-cluster-with-flatcar-workers/) - so it does not even need to be shared

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was more about ssm-cli and whether this is used by users and needs access to the files in /etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that ssm-cli is consuming directly the amazon-ssm-agent but I think it might be wiser to copy directly the files in /etc rather that bind mount.

EOF

mkdir -p "${rootfs}/usr/lib/systemd/system/multi-user.target.d"
{ echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service [email protected]"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the starting of [email protected] in the base image - we also do the coreos-cloudinit start in the base image and could do it similarly (have a unit that has a condition for the the OEM kernel cmdline argument and then uses Upholds= to start it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this

[Unit]
ConditionKernelCommandLine=|ignition.platform.id=packet
ConditionKernelCommandLine=|flatcar.oem.id=packet
ConditionKernelCommandLine=|coreos.oem.id=packet

ConditionKernelCommandLine=|ignition.platform.id=ec2
ConditionKernelCommandLine=|flatcar.oem.id=ec2
ConditionKernelCommandLine=|coreos.oem.id=ec2

ConditionKernelCommandLine=|ignition.platform.id=digitalocean
ConditionKernelCommandLine=|flatcar.oem.id=digitalocean
ConditionKernelCommandLine=|coreos.oem.id=digitalocean

ConditionKernelCommandLine=|ignition.platform.id=gce
ConditionKernelCommandLine=|flatcar.oem.id=gce
ConditionKernelCommandLine=|coreos.oem.id=gce

[email protected]

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true

[Install]
WantedBy=multi-user.target

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service [email protected]"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf"
{ echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf"

Copy link
Contributor Author

@tormath1 tormath1 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - the Upholds units are executed unconditionally:

$ systemctl status [email protected][email protected]
     Loaded: loaded (/usr/lib/systemd/system/[email protected]; static)
     Active: inactive (dead)
  Condition: start condition failed at Fri 2023-09-08 12:14:19 UTC; 17min ago

Sep 08 12:14:19 localhost systemd[1]: [email protected] was skipped because no trigger condition checks were met.
$ systemctl status [email protected][email protected] - Flatcar Metadata Agent (SSH Keys)
     Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; preset: disabled)
     Active: activating (auto-restart) (Result: exit-code) since Fri 2023-09-08 12:34:50 UTC; 7s ago
    Process: 1799 ExecStart=/usr/bin/coreos-metadata ${COREOS_METADATA_OPT_PROVIDER} --ssh-keys=core (code=exited, status=1/FAILURE)
   Main PID: 1799 (code=exited, status=1/FAILURE)
        CPU: 11ms

-> whole qemu test suite is failing. I guess we can go back to ExecStart=systemctl start [email protected]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the contents of /usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf? Is this really the most recent state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, understood, so it seems the condition is only for the [Service] section? And Upholds= still gets used all the time. Then yes, ExecStart=systemctl start [email protected] instead of ExecStart=/bin/true sounds good!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the contents of /usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf? Is this really the most recent state?

It's for qemu so the sysext image is not even present.

so it seems the condition is only for the [Service] section?

That's my conclusion too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so more follow-up for the init PR… Sorry for the misleading suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries - I fixed this right after: flatcar/init#105 now the CI is 🟢

Comment on lines 25 to 27
ExecStartPre=/usr/bin/ln --symbolic /usr/share/amazon/ssm/amazon-ssm-agent.json.template /etc/amazon/ssm/amazon-ssm-agent.json
ExecStartPre=/usr/bin/ln --symbolic /usr/share/amazon/ssm/seelog.xml.template /etc/amazon/ssm/seelog.xml
ExecStart=/usr/bin/ln --symbolic /usr/share/amazon/eks/bootstrap.sh /etc/eks/bootstrap.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the link already exists this will fail, do you want it to be skipped then? This would be possible with ExecStartPre=-.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should talk about this: how we manage the update of /etc. While redoing this section, I was thinking about using cp --backup to a) update the /etc/ files and b) keep any previous configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these files are likely touched by the user I would exclude them from the migration step. This means they will be there as regular files and not updated unless we could identify them as untouched with a checksum and create the symlink. For new instances the symlink is the default and we don't need to do a update logic and this is covered by the sysext content. The user could still overwrite the symlink or replace it with a file if we use ExecStartPre=-.

@pothos
Copy link
Member

pothos commented Sep 25, 2023

Of course, we can keep the OEM ID in the oem-release file to be ami but we still need to translate it somewhere to ec2 or aws for afterburn.

The oem release ID would only be used for the update payload name and the migration file. No translation to ec2 or aws is required.

@tormath1
Copy link
Contributor Author

Of course, we can keep the OEM ID in the oem-release file to be ami but we still need to translate it somewhere to ec2 or aws for afterburn.

The oem release ID would only be used for the update payload name and the migration file. No translation to ec2 or aws is required.

For this one yes, no translation but we still need one for the kernel command line parameter (see: 9df7e19). Otherwise ami is unknown for Ignition and Afterburn, while it can still be supported by Ignition via ignition-generator, Afterburn expects to find ec2 or aws.

Comment on lines 1 to 2
- AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr`. The OEM software is still not updated but this will be added soon.
- The AWS OEM ID kernel command line parameter changed to `ami`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr`. The OEM software is still not updated but this will be added soon.
- The AWS OEM ID kernel command line parameter changed to `ami`
- AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr`

@@ -0,0 +1,2 @@
[Unit]
Upholds=amazon-ssm-agent.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Upholds=amazon-ssm-agent.service
Upholds=amazon-ssm-agent.service setup-oem.service


src_install() {
systemd_dounit "${FILESDIR}/setup-oem.service"
systemd_install_serviced "${FILESDIR}/10-oem-ami.conf" multi-user.target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line correct? I don't see the service running

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after starting manually I think it has some problems:

Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 INFO [ssm-agent-worker] Checking if agent identity type CustomIdentity can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 ERROR [ssm-agent-worker] Agent failed to assume any identity
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 ERROR [ssm-agent-worker] failed to find identity, retrying: failed to find agent identity
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type OnPrem can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type EC2 can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type CustomIdentity can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the termination also happens on the latest Alpha it can be ignored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line correct? I don't see the service running

The file is actually installed in /etc/systemd/system/multi-user.target.d/ so it's not packaged.

Copy link
Member

@krnowak krnowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks.

Copy link
Member

@krnowak krnowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remembered about revision bumping in overlay.

- drop the OEM mention
- install things under /usr/share/amazon/ssm
- add systemd unit from the upstream

Signed-off-by: Mathieu Tortuyaux <[email protected]>
while this ebuild will be dropped in the near future, we still need to
maintain openstack ebuild.

`flatcar-eks` was a runtime dependency of openstack/brightbox too. I
think it was a mistake ?

Signed-off-by: Mathieu Tortuyaux <[email protected]>
Signed-off-by: Mathieu Tortuyaux <[email protected]>
found by booting stable on AWS: `find /usr/share/oem` + checking the
content of files created by base Ignition.

Signed-off-by: Mathieu Tortuyaux <[email protected]>
For this vendor, the OEM ID from the oem-release file is different from
the oem.id kernel commandline parameter.

Signed-off-by: Mathieu Tortuyaux <[email protected]>
Copy link
Member

@krnowak krnowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, looks good from my side.

Copy link
Member

@pothos pothos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tormath1 tormath1 merged commit 6c61372 into main Sep 26, 2023
1 check failed
@tormath1 tormath1 deleted the tormath1/oem branch September 26, 2023 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants