-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootimages: Downloading and updating bootimages via release image #201
Conversation
|
||
## Proposal phase 3: machine-api-bootimage-updater | ||
|
||
Add a new component to https://github.com/openshift/machine-api-operator which runs `machine-os-bootimage-generator`, uploads the resulting images to the IaaS layer (AMI, OpenStack Glance, etc.) and updates the machinesets for the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we moving away from public AMIs?
There was some talk about possibly changing machine-controller to an image reference rather than updating each machineset. Either of these cases has the question of mixed image support. Do we plan on doing that in the future, or do we do it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openshift/installer#2906 suggests moving away from AMIs, yep.
|
||
The implementation of this is basically shipping a subset of [coreos-assembler](https://github.com/coreos/coreos-assembler) as part of the OpenShift release payload, and teaching `oc` how to invoke `podman` to run it. | ||
|
||
The `generate-bootimage` implementation would download the `machine-os-bootimage-generator` container image along with the existing `machine-os-content` container image (OSTree repository), and effectively run the [create_disk.sh](https://github.com/coreos/coreos-assembler/blob/30fbac4e176c7936362efbd647c8199d927e593c/src/create_disk.sh) process or [buildextend-installer](https://github.com/coreos/coreos-assembler/blob/30fbac4e176c7936362efbd647c8199d927e593c/src/cmd-buildextend-installer) for live media, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this at all? If we're already pulling all the binaries to build the image, why not just publish an image that contains a ISO or whatever format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think the basic answer is that if we e.g. ship a container image wrapping each pre-built bootimage, that's at least 700MB per image and there are like 7-8 or more images. That's...a lot. So things like oc adm release mirror
would have to learn how to filter them. It'd make the OpenShift release process to push them much slower.
It just seems more tenable to me to make this dynamic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So things like oc adm release mirror would have to learn how to filter them
That seems like a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a bit more on why? Just in terms of avoiding needing a new nontrivial container image? Needing to ship code that was formerly "behind the firewall" to run on premise? Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the bit level, we'll know the image they are booting is the one we built, we don't have to worry about a bug in the builder producing something incorrect in their environment. It's also one less moving piece we need to worry about in-cluster. My preference is to add nothing to the cluster that doesn't need to 100% need to be there.
|
||
## Proposal phase 3: machine-api-bootimage-updater | ||
|
||
Add a new component to https://github.com/openshift/machine-api-operator which runs `machine-os-bootimage-generator`, uploads the resulting images to the IaaS layer (AMI, OpenStack Glance, etc.) and updates the machinesets for the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal seems to me an extension of the current upgrade workflow driven by the Machine Config Operator.
If I understand correctly this is introducing a new upgrading step to run the machine-os-bootimage-generator
to use the just brought machine-os-content
to build an upload an artifact during an upgrade. So this belong together with all the upgrading business logic to the Machine Config Operator. And only when this last step is complete the MCO should signal a cluster upgrade as completed. So then any component in any environment e.g UPI have an available artifact to consume.
I can't see why we'd want to couple this with the machine API or the machine API operator.
I'd find it more reasonable for the MCO to make the new artifact ID available via e.g configMap as part of this new upgrading step and let any consumer e.g machineSets to reference it as they see fit e.g spec.Ami: "ocp-cluster-latest"
.
Watching and updating all the machineSets massively would be again effectively an additional step for an upgrade workflow to be considered complete. And I don't think we should split or couple our upgrading process to something orthogonal like the machine API, this belong to the Machine Config Operator to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this belong together with all the upgrading business logic to the Machine Config Operator. And only when this last step is complete the MCO should signal a cluster upgrade as completed.
I don't think it's that clear cut. Changing the bootimage has no effect on any running system; it's not clearly coupled with the main CVO upgrade logic.
I can't see why we'd want to couple this with the machine API or the machine API operator.
Because there's nothing to do if the machineAPI isn't in use.
I'd find it more reasonable for the MCO to make the new artifact ID available via e.g configMap as part of this new upgrading step and let any consumer e.g machineSets to reference it as they see fit e.g spec.Ami: "ocp-cluster-latest".
Again none of this applies unless machineAPI is in use. Sure, the code could technically live in the MCO but...
Another argument for having it live in machineAPI is that it requires IaaS credentials, usage of those APIs - the MCO isn't set up with that today, but machineAPI is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or to expand on this - in a non-machineAPI (i.e. UPI) scenario, it'd be up to the customer to integrate this tooling into whatever they use, such as updating their AWS CloudFormation template, etc. Similar for the "manual bare metal PXE" scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing to note, which is called out in this proposal - this is not replacing the default in-place updates. So even in a machineAPI/IPI scenario, if we edit the machinesets as part of the default CVO flow (whether the MCO edits or machineAPI edits), we would rely on the default machineAPI semantic of not reprovisoning machines when a machineset is changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again none of this applies unless machineAPI is in use. Sure, the code could technically live in the MCO but...
I see only a tangential point here to the machine API -> new machines belonging to a machineSet should come with the latest OS dictated by the MCO. The lifecycle of that OS artifact is transparent for the machine API. The machine API should be a consumer of this, just like any other component/tooling creating a new instance (e.g cloud formation) would be. They'd only need to know about where this is available.
Another argument for having it live in machineAPI is that it requires IaaS credentials, usage of those APIs - the MCO isn't set up with that today, but machineAPI is.
The machine API has actually no particular knowledge about IaaS credentials. It just happen to consume what ever the cloud credentials operator makes available for any component in the cluster.
Or to expand on this - in a non-machineAPI (i.e. UPI) scenario, it'd be up to the customer to integrate this tooling into whatever they use, such as updating their AWS CloudFormation template, etc. Similar for the "manual bare metal PXE" scenario.
I think It's up to the consumer to integrate as they see fit but to consume a consistently formatted "artifact address" given by the component which is the authoritative source of truth for cluster upgrades i.e the MCO. This consumer could be anything: machine API, cloud formation, etc. They are orthogonal building blocks.
Another thing to note, which is called out in this proposal - this is not replacing the default in-place updates. So even in a machineAPI/IPI scenario, if we edit the machinesets as part of the default CVO flow (whether the MCO edits or machineAPI edits), we would rely on the default machineAPI semantic of not reprovisoning machines when a machineset is changed.
I'm not in favour of the MCO editing machineSets to complement a cluster upgrade. That would be effectively scattering our upgrade process into different components and coupling our upgrade process to the machine API.
I think this proposal describes a missing step in our current upgrade workflow driven by the authoritative source of truth for cluster upgrades i.e MCO. Once this would be supported then any component/tooling (e.g machine API, anything) could transparently react as they see fit to consume a consistently formatted source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposal is to
- let machine config operator ecosystem generate the artifact that's suitable for upload
=== - machine-api-operator ecosystem consuming that artifact and creating the cloud specific Image
- machine-api-operator ecosystem then ensuring all required machinesets have the latest Image.
I think the separation make sense because machine config + rhcos knows best how to create bootimages.. and machine-api ecosystem knows best how to talk to cloud to create a cloud specific images and interact with Machinesets.
I don't understand why we're not publishing public images in all the clouds. Seems like we're trying to do a ton of stuff to avoid doing that. I don't really want to support an image pipeline that lives on each cluster. The whole point of RHELCoS was to avoid doing that, I thought. |
That doesn't cover disconnected environments (on-premise OpenStack) nor does it help bare metal users mirror everything with just the release image, instead of the release image + rhcos downloads externally. |
It's trivial to download an image on put it on openstack. If you want to run disconnected, that's the price you pay.
I'm not familiar with what a ground-up bare metal installation process looks like. How much effort does downloading a bootable image add to that process today? It also doesn't seem to solve the workflow of:
In any case, for UPI installs, I don't really have a preference. For IPI installs, we should be using public images IMO. |
Not as easy as you'd think - people don't understand that e.g. we require a separate RHCOS per OpenShift minor version. And further, it doesn't address the on-going issue of keeping the images up to date.
Sorry, I disagree - at Red Hat we support a hybrid cloud.
It's not at all the biggest stumbling block, bare metal admins tend to understand this stuff. Ignition is much more new, learning how to do static IP addressing etc. are bigger issues. But another way to look a this is: with OpenShift 4 (vs 3.x), you no longer need to know how to manage/mirror/update RHEL install media (or AMIs), RPMs and containers by default. You just need to know how to mirror RHCOS media and containers. With this proposal, you just need to know how to mirror containers! |
The boot images need to be available somewhere to actually boot the machines. So you still need to have a way to mirror the images, whether you build them yourself or you download them. Once the cluster is up and running, you could automatically build new images, but there's no way to automate uploading the boot images to the provisioning network/storage because we don't own it (in the UPI case). In some cases (many cases?) the provisioning infrastructure is not going to be network accessible from your application network, so an admin will need to download the artifact from the cluster (but, they still need to know which artifact) to their workstation, then upload it to the provisioning storage.
This is explained in a single sentence. There should be a single page quick start guide for every release that tells you exactly what you need to mirror.
Seems like ignition is the wrong technology for this kind of thing. OSTree works well, but the ignition model seems to be a large hurdle, IMO, that's the elephant in the room, we made more work for ourselves because we have a very narrowly supported installation model that moved the problem from hosts we control to infrastructure we don't. |
|
||
In fact, we could aim to switch to having workers use the `bootimage.json` from the release payload immediately after it lands. A downside is this would open up potential for drift between the bootstrap+controlplane and workers. | ||
|
||
## Proposal phase 2: oc adm release generate-bootimage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include details on a option that prints the contents of bootimage.json like mentioned here https://github.com/openshift/enhancements/pull/201/files#diff-9b570624a774881c701bbfbc18b60109R64
a way to get the prebuilt bootimages if the user sees fit as an alternative to generating the whole thing...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets into a fundamental question of whether we do ship pre-built bootimages going forward in general. Clearly in "phase 1" we continue to do so as we can't break the way openshift-install works today.
The thing is though in any disconnected environment it'd require another step for admins to mirror on an ongoing basis.
On one hand though, we absolutely are going to need to make it ergonomic at some point to "boot a single RHCOS node" for testing. There are many valid reasons for this, among them hardware validation before trying a cluster install, experimenting with configuration "live" that one then compiled into Ignition/MachineConfig, etc. And until such time as we have openshift-install run-one-coreos-instance
or something...people are going to want to download the bootimages directly and boot them too.
So...I'm conflicted.
Maybe what happens in the end is we do do both - but the cadence of "golden bootimages" would still be once per release (for the bootstrap). There's no point to us uploading e.g. new AMIs if in practice clusters are going to use in-place update mechanisms anyways.
IOW...I think people doing UPI installs would turn to automating oc adm release generate-bootimage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For phase 1, i think it's necessary for migration.
we could add a --info
flag with it defaulting to true to begin with. and when we have actual generation capability we move to --info
false so users are generating the images.
i think --info
will continue to provide value in future, because it will allow the customers to get information like
what's the rhcos version, maybe what's included, if there are any publicly available images they are try let's use them. and as long as the output is versioned users have a way to differentiate and automate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'm thinking about this again. I guess one thing here is what you most strongly objected to was having openshift-install
be the frontend for this information (clearly we want to do all the other stuff here though too).
But clearly a "MVP" for this is basically what you're saying with --info
. The question is how to get there from the today where RHCOS is pinned in the installer. I really don't want two sources of this information (it's already bad enough docs are updated manually). So...what if we still added a hidden/secret mechanism for oc adm release new
to download openshift-install
and extract this data from it somehow?
The simplest would perhaps be a hidden version of openshift-install hidden-only-for-oc-output-coreos-build
or so.
Or, we could stick it in a special ELF section (would require oc adm
to vendor some ELF tools).
Or, we could do a variant of openshift/installer#1422 and basically scrape the binary; the lowest tech approach...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include an object into the release image like we include the machine-os-content.
This object can then be used to provide the information as part of oc adm release info --boot-images
trying to get this data from installer is not correct imo. the installer's embedded data is mostly for bootstrap-host and it currently being used for cluster hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are all agreed on the long term goal of getting the data out of the installer. The problem is in the short term, having two sources of truth for this data is going to be very problematic.
Now...hmm, we could get the data into a container in the release payload and change oc adm release new
to patch the openshift-install
binary, the same way it does for the release image itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of doing this is that anyone building openshift-install
from git...well, it just won't work. I guess we could change the installer to fetch "the latest" from a URL if built from git, basically the same way we do with the release image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For metal IPI CI, we stuff the data into the baremetal-installer container because we have to have it. See: openshift/installer#3330
The way users get this data to use metal IPI is really painful (openshift-install version
, and fetching the rhcos.json from GitHub for that sha!). Basically every on-premise platform needs access to the image locations...
I dislike client-side patching of the installer binary. Storing it somehow next to machine-os-content makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike client-side patching of the installer binary.
I'm not talking about any client side patching. This is something that would happen in the OpenShift build system (release controller).
Storing it somehow next to machine-os-content makes sense to me.
Yes but the problem with that is that it's the bootstrap node that pulls the release image, so that leaves open the problem of which image to use for the bootstrap.
Now I guess a first transitional step could be changing the installer to use the metadata from the release payload for the cluster, and leave the pinned version only for the bootstrap node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike client-side patching of the installer binary.
I'm not talking about any client side patching. This is something that would happen in the OpenShift build system (release controller).
oc adm release extract
patches the binary. Sure, the binaries published on mirror.openshift.com already had that run to get them, but users can do that themselves too client side.
It's the default for baremetal since we don't publish ours.
Storing it somehow next to machine-os-content makes sense to me.
Yes but the problem with that is that it's the bootstrap node that pulls the release image, so that leaves open the problem of which image to use for the bootstrap.
A user can just use oc adm release info
? That's at least a consistent place for any release-related information.
I'm in favor of the change. It'll make a single process for mirroring for an offline install. Plus this goes nicely with the RHEL 8.2 support for installing a fully up-to-date machine by initially installing any updated packages, beyond just those in the GA compose. |
One subtle case we will need to handle eventually is that in some clouds there are "properties" of an image (beyond just the raw content) that we need to preserve. For example:
And one that came up recently is around GCP - I want to flag {F,RH}CoreOS as supporting UEFI and secure boot which will enable TPMs. But an even trickier one in GCP is the nested virt license - we don't want to enable this by default but we do want to enable users to turn it on. We're talking about patching the installer to do this, but if we do that eventually it'd need to write an in-cluster object too so we know to do it for any images the cluster uploads. For most of the first chunk though I think we're going to need to do something like:
For example, say that we bump the minimum required VMware version in 4.X - that would show up as |
I wanted to write something up here about manually updating bootimages. It's not easy unfortunately. In the case of AWS for example, openshift/installer copies the AMIs w/encryption. So if for example one just grabs the AMIs from the RHCOS metadata in the installer and edits the machinesets to use the, you'd end up with unencrypted disks. More recently this installer PR switches things to use AWS KMS directly (should appear in 4.5 and above). So, someone could also set that up manually. In the case of e.g. GCP and other IaaS clouds, we don't upload bootable images directly, the installer creates them from a seralized form. So anyone who wants to update things would have to replicate that bit too (not particularly difficult, but definitely cloud specific). See also openshift/installer#2906 For bare metal at least, updating bootimages manually should just be a matter of e.g. changing your PXE server or whatever to reference the new images, e.g. 4.3. |
|
||
## Proposal phase 1: bootimage.json in release image | ||
|
||
First, the RHCOS build process is changed to inject the current coreos-assembler `meta.json` output for the build into `machine-os-content`. This aims to move the "source of truth" for cluster bootimages into the release image. Nothing would use this to start - however, as soon as we switch to a machine API provisioned control plane, having that process consume this data would be a natural next step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have some details about what this means?
I've read through what I can find and I don't understand how machine-os-content is produced, there's no source location in the release image.
As a user, how would I consume this? How would I find out which is the image URL for say the OpenStack image? Does the meta.json become an annotation on the image or something?
{
"name": "machine-os-content",
"annotations": {
"io.openshift.build.commit.id": "",
"io.openshift.build.commit.ref": "",
"io.openshift.build.source-location": "",
"io.openshift.build.version-display-names": "machine-os=Red Hat Enterprise Linux CoreOS",
"io.openshift.build.versions": "machine-os=45.81.202003131108-0"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this exists yet - it's a proposal.
But this is the same question as #201 (comment) right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how machine-os-content is produced
Via https://github.com/coreos/coreos-assembler and https://gitlab.cee.redhat.com/coreos/redhat-coreos/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it thank you - I was looking for details about how this should be implemented to see if we could already do part of it.
The problem for us is that baremetal (and openstack) IPI both need to know where the RHCOS images live. Today we do awful hacks with openstack-install version
and GitHub to fetch the data that don't work in CI. There, openshift-install version
reports a sha that doesn't exist due to the rebasing strategy CI uses. We ended up with a temporary workaround and just copy rhcos.json into the installer container.
I'd be interested to see if we could get a better approach to make the data accessible to end users (this --info
flag @abhinavdahiya mentioned)
Implementation thought; we shipped osmet and are now making use of it in FCOS (and will for RHCOS 4.6). It'd be useful to try generalizing it to be able to work across other images. Chat summary:
@jlebon said
That said, it needs to be weighed against other implementation approaches like having the Ignition platform ID be at an easily detectable offset and so we can have tooling that "punches" it in without having to do anything like actually mount the disk (i.e. run qemu). That works as long as the only difference between the disk images is the platform ID (and not possibly other things like the bootloader timeout). |
This proposes a path towards integrating the bootimage into the release image and become managed by the cluster. This aids worker scaleup speed, helps in mirroring OpenShift into disconnected environments, and allows the CoreOS team to avoid supporting in-place updates from OpenShift 4.1 into the forseeable future.
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@cgwalters: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@cgwalters: The In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: crawford The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@cgwalters: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@cgwalters: The In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@cgwalters: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I want to highlight since important bits got lost in the github comment hiding: |
This proposes a path towards integrating the bootimage into the
release image and become managed by the cluster. This aids worker
scaleup speed, helps in mirroring OpenShift into disconnected
environments, and allows the CoreOS team to avoid supporting
in-place updates from OpenShift 4.1 into the forseeable future.