rhcos/extensions: Support for CoreOS extensions #317

cgwalters · 2020-05-11T23:30:35Z

This enhancement proposes a MachineConfig fragment like:

extensions:
  - usbguard

This is the OpenShift version of the Fedora CoreOS extension system tracker.

That will add additional software onto the host,
but this software will still be versioned with the host
(included as part of the OpenShift release payload) and
upgraded with the cluster.

michaelgugino · 2020-05-12T00:32:32Z

enhancements/rhcos/extensions.md

+
+OpenShift is already making use of this today for the [realtime kernel](https://github.com/openshift/enhancements/blob/master/enhancements/support-for-realtime-kernel.md).
+
+We propose continuing the trail blazed with the RT kernel by adding additional RPMs to `machine-os-content` that aren't part of the base OS "image", but can be installed later.  This is how `kernel-rt` works today; we added a `kernel-rt.rpm` that is committed to the container.  In the future though, we may instead ship a separate `machine-os-extensions` container, or something else.  The only API stable piece here is the `MachineConfig` field (same as for the `kernelType: realtime`).


If we're going this route, why not publish all OS content into a container as a repo, install the content from said repo. Then we can support RHEL utilizing the same source of packages published per version.

First: that's a much much larger set of stuff, including a lot of stuff that has no business ever being installed on RHCOS (e.g. ruby, postgresql, gcc and gnome-shell). The RHEL 8 DVD is 6.8GB.

The scale of this proposal is probably (hopefully) at most 50-100MB of additional stuff, which is an order of magnitude or two smaller in size. And by package count it's probably 20 packages tops versus thousands.

Second...people who have reason to access all of that should have entitlements set up anyways.

In addition to usbguard and real-time kernel, there are use cases for this regarding kata (see https://github.com/harche/enhancements-1/blob/66675b4dd1fde442504ac331bc6343d90cfe1f1f/enhancements/kata/kata-operator.md) and, presumably, KubeVirt. In both cases, you need a qemu suitable for that version of RHCOS, and at least in the case of kata, a number of additional binaries.

KubeVirt runs qemu in a container, it's not needed on the host

@cgwalters I didn't mean package all of RHEL. I mean all of OpenShift. So whatever we're packing into CoreOS today, make those the same type of installation method as the 'extensions'.

@cgwalters I didn't mean package all of RHEL. I mean all of OpenShift. So whatever we're packing into CoreOS today, make those the same type of installation method as the 'extensions'.

Ah. Well...the thing is I'd really like to keep the default RHCOS being in "image mode" by default where we're doing the depsolving, selinux labeling etc. all on the server and not per machine.

This of course gets into coreos/rpm-ostree#1237

I see what you're thinking here but the counter is that if you want your OS content lifecycled with the cluster...that's what RHCOS is.

This of course gets into coreos/rpm-ostree#1237

This is an undecipherable rabbit hole for me. I'm sure it makes a lot of sense to people heavily involved with these things, but I find it quite unapproachable.

that's what RHCOS is

I fail to see a distinction. We have some process that pulls in RPM content and delivers them to a host. The only thing that is different is the underlying implementation.

If we add these 'extensions' we're opening ourselves up to the exact same problems RHEL has when it comes to shipping content. Some package the user installs introduces a conflict. Most packages also have some sort of configuration. How are users going to handle configuration conflicts across versions? Not to say these problems are blockers, but they seem to diminish the utility of RHCOS vs RHEL.

There's another proposal about building RPM ostrees in-cluster, and then also uploading images to the cloud for each release. We might as well just build RHEL images at this point. We might as well run a repo pod and stream just the bits that are updated rather than a whole ostree to each host.

I think the OStree model is good and has it's place. Our software delivery model doesn't seem to necessarily be the best match, and it seems we're slowly rebuilding RHEL.

Sorry I meant to link coreos/rpm-ostree#1081

Some package the user installs introduces a conflict.

Conflict with what? We own the set of extensions and testing them.

How are users going to handle configuration conflicts across versions?

Do you have a more specific exmaple? Something like usbguard changing its configuration file incompatibly? That seems like a "don't do that" situation. But...I can imagine eventually we will need some sort of special handling in the MCO for this.

There's another proposal about building RPM ostrees in-cluster, and then also uploading images to the cloud for each release. We might as well just build RHEL images at this point. We might as well run a repo pod and stream just the bits that are updated rather than a whole ostree to each host.

I won't attach my name to an operating system that leaves the root filesystem in a corrupted state if the kernel freezes or you run ENOSPC etc. during an update. There are many other benefits to an "image based" system by default, among them clear versioning - typing rpm-ostree status shows you very clearly any deltas from the base image, you can always reset back to the base.

By default, you are running an exact set of bits that has been tested in CI rather than dynamically reassembled on each machine.

And notably with this extension proposal...we would probably start without any enabled extensions, so there's still no client-side depsolving.

Generic lifecycle of RPMs on top of RHCoS should be completely out of scope. We aren't shipping RHEL with OpenShift - we're shipping the package set that is needed to run containers, and everything else should be in a container.

Are we just using this to avoid the real issue, of defining, what goes on the host and what goes into a container?
as (hinted at by) coreos/fedora-coreos-tracker#401 (comment)

dcbw · 2020-05-12T01:49:33Z

@cgwalters could we do OVS this way? Is that an appropriate way to install OVS as a core component requried at cluster install time?

cgwalters · 2020-05-12T01:50:57Z

@cgwalters could we do OVS this way? Is that an appropriate way to install OVS as a core component requried at cluster install time?

Yep, definitely. Among other things, with this we could also e.g. "ratchet" in a new OVS version by having two and letting the SDN layer control which one to use.

enhancements/rhcos/extensions.md

danwinship · 2020-05-12T14:00:36Z

enhancements/rhcos/extensions.md

+
+OpenShift is already making use of this today for the [realtime kernel](https://github.com/openshift/enhancements/blob/master/enhancements/support-for-realtime-kernel.md).
+
+We propose continuing the trail blazed with the RT kernel by adding additional RPMs to `machine-os-content` that aren't part of the base OS "image", but can be installed later.  This is how `kernel-rt` works today; we added a `kernel-rt.rpm` that is committed to the container.  In the future though, we may instead ship a separate `machine-os-extensions` container, or something else.  The only API stable piece here is the `MachineConfig` field (same as for the `kernelType: realtime`).


This is how kernel-rt works today; we added a kernel-rt.rpm that is committed to the container.

How does this work with base OS versioning? I guess in the existing case you must be doing this in the machine-os-content build (or something similarly-low-level) so the build process knows what RHEL/kernel version it is targeting? But what about something like the CNI binary problem? To be able to use this feature to install CNI binaries, we would need to build separate RHEL7 and RHEL8 RPMs and then be able to cause the correct one to be installed on a per-node basis.

Actually, I think the proposal as written is ambiguous about whether you're proposing "there will be a single container in the release image (machine-os-content?) that contains RPMs that can be installed as extensions" or "it will be possible to install RPMs that come from any container in the release image". I guess you probably meant the former since there's no discussion of how the RPMs would be found... But that makes it not as good for solving the CNI / coreos/fedora-coreos-tracker#354 problem.

(Although, I guess, we could install the RPMs into "higher-level" images like origin-multus-route-override-cni, and then machine-os-content or some other "lower-level" image could extract the correct ones from those images and copy them into itself, like it does to get hyperkube from the origin image...)

Anyway, I feel like CNI / coreos/fedora-coreos-tracker#354 should be explicitly either a Goal or a Non-Goal...

I think there's really two levels to this proposal, and I guess I need to cleanly separate them.

The first part is really about the customer UX for adding things that are not conceptually on by default, such as usbguard.

But what this does quickly bleed into is using such a mechanism for default parts of OpenShift, such as openvswitch.

But were we to ship ovs that way, I think it wouldn't be something an administrator could choose in the MCO extensions field - it'd be an implementation detail of the clusterNetwork choice.

And the same would be true for the CNI binaries.

Does that makes sense?

We don't want to expose CNI plugins / system-vs-containerized OVS as end-user options, but I don't think it makes sense to have the MCO try to figure out what networking RPMs to install by analyzing the network config itself. It seems like it would make more sense to have CNO tell MCO what RPMs it needs, somehow. That may or may not make sense as part of the same mechanism as for the end-user-selectable extensions.

OK yep, we're in sync on that then.

smarterclayton · 2020-05-12T16:59:45Z

enhancements/rhcos/extensions.md

+
+### Non-Goals
+
+- Direct support for installing RPMs *not* from the release image


Isn't this what customers actually want?

Some of them probably, for some things...

But trying to support it would be a huge increase in scope; for things we ship we know we've tested it in combination with the particular OS version. Doing that for random 3rd party RPMs (e.g. PAM hooks or hardware raid tooling) is a totally different thing.

The other big problem domain here is: who chooses when updates for those 3rd party extensions (RPMs) happen? The easiest would be to only trigger when the base OS updates, but...

Agreed. Some customers will want to install RPMs, but that doesn't make it supportable, opinionated, or the right move. I'd argue most customers are more interested in a stable, upgradable OCP (which RHCOS is an important part of).

sferich888 · 2020-05-15T02:42:09Z

enhancements/rhcos/extensions.md

+
+Do nothing: Probably RHCOS will continue to grow and we will have to look embarrassed when someone asks us why `usbguard` is installed in their AWS cluster where the VMs don't have USB.
+
+Force `usbguard` authors to containerize: Not a technical problem exactly but...it is *really* hard to have two ways to ship software; see above.


Isn't this the point of the container revolution? You move away from the old methods of packaging and delivering software, to containers (so that its the standard)? - insert xkdc meme for standards.

It seems to be that were trying to solve a problem for people because they are obstinate (yes change is hard, and I am empathetic to that, but how / why is that OpenShift's issue).

Really what we're trying to do is not ship all the things in base to meet all the requirements. To do would increase the size of the boot disks. Instead, we're trying to push off things which are needed in specific scenarios (Example: usbguard in environments with exposed USB who care about security) rather than "put it in base".

sferich888 · 2020-05-15T02:45:08Z

enhancements/rhcos/extensions.md

+
+#### 3rd party RPMs
+
+This will blaze a trail that will make it easier to install 3rd party RPMs, which is much more of a risk in terms of compatibility and for upgrades.


Does this perpetuate bad/lazy behaviors. I touch on this in https://github.com/openshift/enhancements/pull/317/files#r425535006

In short, ISV's, Integrator's, Developer's (in general) will take the path of least resistance to get work done, and complete tasks (under the guys that they work).

It sound like this proposal - does not expose this to ISV's and third parties (which I am ok with), but if we do ever decide to open this up, do we run the risk of forcing components to be containerized. Do we begin to show / set a bad example, simply for the sake of easing friction?

That's correct, this feature does not support 3rd parties.

Do we begin to show / set a bad example, simply for the sake of easing friction?

The way I see things is we have a spectrum. We can't change the whole world at one time.

JAORMX · 2020-05-15T06:46:49Z

This addition would be really useful for security packages (as the example was, usbguard) and would really help us in the compliance space.

enhancements/rhcos/extensions.md

ashcrow

❤️

enhancements/rhcos/extensions.md

sinnykumari

One comment other than that proposal looks good to me!

sinnykumari · 2020-05-27T07:44:33Z

enhancements/rhcos/extensions.md

+
+This enhancement proposes a MachineConfig fragment like:
+```
+extensions:


As far as I understand the list of extensions is going to increase in the future depending upon requirements.
One concern I have in terms of implementation is how we are going to manage additional dependencies on which these extension packages may depend? For example, installing usbguard also installs package foo as dependency but foo is not being shipped in BaseOS.

We include those dependencies in an extra dependencies/ directory currently.

sweet, didn't know about that.

This enhancement proposes a MachineConfig fragment like: ``` extensions: - usbguard ``` This is the OpenShift version of the [Fedora CoreOS extension system tracker](coreos/fedora-coreos-tracker#401). That will add additional software onto the host, but this software will still be versioned with the host (included as part of the OpenShift release payload) and upgraded with the cluster.

danwinship · 2020-05-27T13:32:35Z

I just remembered another future feature that will need this: IPsec support. Trying to run the IPsec daemons in containers would be incredibly precarious. (When the containers restarted, the node would lose all network connectivity.) So we'd want the "ipsec operator" to be able to cause libreswan to be installed at the RHCOS level on all nodes.

crawford · 2020-06-11T15:36:45Z

enhancements/rhcos/extensions.md

+- Direct support for installing RPMs *not* from the release image
+- Support for traditional RHEL systems (see below)
+
+## Proposal


I don’t see it called out, but we will need telemetry from the MCO telling us what extensions are being used by customers.

crawford · 2020-06-11T15:38:25Z

enhancements/rhcos/extensions.md

+
+## Proposal
+
+1.  RHCOS build system is updated to inject `usbguard` (and other software) into `machine-os-content` as an RPM


How do we decide what to include in the set of extensions over time? Will we ever remove extensions?

RFE's more than likely. It's not likely we'd remove any extensions.

ashcrow · 2020-06-18T15:21:25Z

Merging based on positive feedback and work already done. If your interested in the epic see https://url.corp.redhat.com/rhcos-extension-system. Further questions and comment may be added here for discussion as needed.

ashcrow · 2020-06-18T15:22:38Z

/lgtm

openshift-ci-robot · 2020-06-18T15:22:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, JAORMX

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ashcrow]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot requested review from ecordell and ericavonb May 11, 2020 23:31

michaelgugino reviewed May 12, 2020

View reviewed changes

c3d reviewed May 12, 2020

View reviewed changes

enhancements/rhcos/extensions.md Show resolved Hide resolved

ashcrow requested review from JAORMX and mrogers950 May 12, 2020 13:02

danwinship reviewed May 12, 2020

View reviewed changes

JAORMX approved these changes May 12, 2020

View reviewed changes

smarterclayton reviewed May 12, 2020

View reviewed changes

sferich888 reviewed May 15, 2020

View reviewed changes

ashcrow reviewed May 15, 2020

View reviewed changes

enhancements/rhcos/extensions.md Outdated Show resolved Hide resolved

cgwalters force-pushed the os-extensions branch from f92142b to 4913151 Compare May 15, 2020 17:30

ashcrow approved these changes May 15, 2020

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2020

sdodson reviewed May 19, 2020

View reviewed changes

enhancements/rhcos/extensions.md Show resolved Hide resolved

sinnykumari reviewed May 27, 2020

View reviewed changes

cgwalters force-pushed the os-extensions branch from 4913151 to 367cad6 Compare May 27, 2020 12:20

crawford reviewed Jun 11, 2020

View reviewed changes

openshift-ci-robot assigned ashcrow Jun 18, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2020

openshift-merge-robot merged commit 7e65c58 into openshift:master Jun 18, 2020

sinnykumari mentioned this pull request Jun 19, 2020

Support installing extensions shipped by RHCOS openshift/machine-config-operator#1850

Closed

cgwalters mentioned this pull request Jun 23, 2020

Bug 1837039: rhcos: Bump to 45.81.202005181029-0 openshift/installer#3613

Merged

JAORMX mentioned this pull request Sep 30, 2020

Add RHCOS STIG content and enable for NIST ComplianceAsCode/content#6046

Merged

miabbott mentioned this pull request Feb 15, 2021

kata containers enhancement proposal #366

Closed


		OpenShift is already making use of this today for the [realtime kernel](https://github.com/openshift/enhancements/blob/master/enhancements/support-for-realtime-kernel.md).

		We propose continuing the trail blazed with the RT kernel by adding additional RPMs to `machine-os-content` that aren't part of the base OS "image", but can be installed later. This is how `kernel-rt` works today; we added a `kernel-rt.rpm` that is committed to the container. In the future though, we may instead ship a separate `machine-os-extensions` container, or something else. The only API stable piece here is the `MachineConfig` field (same as for the `kernelType: realtime`).


		### Non-Goals

		- Direct support for installing RPMs not from the release image


		Do nothing: Probably RHCOS will continue to grow and we will have to look embarrassed when someone asks us why `usbguard` is installed in their AWS cluster where the VMs don't have USB.

		Force `usbguard` authors to containerize: Not a technical problem exactly but...it is really hard to have two ways to ship software; see above.


		#### 3rd party RPMs

		This will blaze a trail that will make it easier to install 3rd party RPMs, which is much more of a risk in terms of compatibility and for upgrades.


		## Proposal

		1. RHCOS build system is updated to inject `usbguard` (and other software) into `machine-os-content` as an RPM

rhcos/extensions: Support for CoreOS extensions #317

rhcos/extensions: Support for CoreOS extensions #317

Conversation

cgwalters commented May 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcbw commented May 12, 2020

cgwalters commented May 12, 2020

danwinship May 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JAORMX commented May 15, 2020

ashcrow left a comment

Choose a reason for hiding this comment

sinnykumari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship commented May 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashcrow commented Jun 18, 2020

ashcrow commented Jun 18, 2020

openshift-ci-robot commented Jun 18, 2020

danwinship May 12, 2020 •

edited

Loading