-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design for osImageURL updates - integration with CVO/release payload #183
Comments
For completeness, one alternative is to make it a That said, using a separate |
Remember, this data will need to flow one way or another down into a |
Strawman:
Basically, we can include a subset of what coreos-assembler already outputs. Or maybe just a link to that |
I think it shouldn't be too hard to adapt the MCC either way so we shouldn't constrain ourselves too much on what's easier. |
Fair. Rephrased: what's doable and consumable within specific constraints 😃 |
Do we need anything other than the EDIT: Ah sorry I missed this:
Yeah, though...a tricky part about this is that the pkgdiff is against the previous version, which may not be the one they're updating to... Mmm. I'm OK stuffing the whole |
Yeah, I think to do this correctly, the pkg diff would have to be done from the pkg lists instead rather than precomputed. Anyway, those are things that could easily come later. We just have to make sure we leave the door open for it.
Yeah, that sounds fine to me. |
Be aware you’re going to eventually need to have a distinction between “the config says this is the latest” and “I’m ready to roll that out to the nodes”. Design with that in mind because new kubeelts are going to happen ~ 1/week |
Pkgdiff is useless. There is no guarantee it has any relevance. Think package manifest instead of diffs. Our errata and higher level tools will calculate the diff |
Can you elaborate on that? Do you mean making sure we actually upgrade to the selected release instead of whatever happens to now be the latest at the time we're actually ready to upgrade? I think that should be covered yeah. The metadata includes the full sha256 of the oscontainer.
Yup, see coreos/coreos-assembler#226. |
We don't decide on version 4.0.6 until the last moment. So the goal of automation is to continuously have a set of 4.0.6 candidates that we then pick one and ship it. The "train" mindset, not the "artisanal release payload". |
Expectation is that you will build an image and push it to the openshift origin integration image stream, then reference the component “os” from this operator and have the dummy value substituted. In OCP, we do something similar where the OS content gets built via whatever process and shows up in the prerelease list, and is processed the same way. You can build anywhere - we just need you to push/be imported to the right place |
Note the dummy value can be real - but since we need to mirror the content you have to be referenced in the operator list which means you need to be sucked into the right place. Push vs scheduled import is also possible, but if we do scheduled import the source location has to be appropriately gated like a push would (only changes if you test against latest in an install) |
That breaks the sceduled-import model, doesn't it? How do you know the "latest" test used for the gate hadn't been surpassed by further release-payload work? Or will errors there be caught by post-testing? |
My notes from playing around with MCD state so far; I am still a bit confused as to the current flow of the Next, currently our oscontainers are uploaded to api.ci under the For some reason it's not working to |
I've never tried to edit the source and always edited the generated content. Only generated versions should ever be available to the MCD from the MCO. FWIW the resource version should get updated on edit. The MCO is in charge of updating the annotations. EG:
That I'm not sure about |
Looked at that PR more carefully and it's only about the bootstrap. The pull secret goes into the main What I don't understand yet is where that pull secret ends up on the nodes. |
@cgwalters Pull secret gets written into the controller config here: |
Hmm, I think you're right that that should trigger a regeneration of a new machineconfig. I'm trying that here, but my MCC now is hitting:
which is a flavour of some of the errors people were hitting in #199. Will try to dig deeper there.
I will admit the hacky way I've been testing upgrades so far includes prepending the |
Yep, I missed it somehow, I see it now, it goes to |
@jlebon those timeouts are related to the openshift apiserver: openshift/origin#21612 |
My understanding is that the source should be edited (which should result in a new generated config). We should not be in the habit of editing the generated config (and if that happened, the MCC should actually roll-out a non-edited config to all nodes -- as that is built from the canonical source(s)). It might help to add docs along the lines of, "As a user, how do I modify host configuration?" and it points to creating a new (source) machine config (as a layer that will be merged with config we control), and outlining that you should not be editing generated configs (that will ultimately be stomped on by MCC rolled-out source-generated configs anyway). cc @abhinavdahiya these assumptions are still correct. |
OK yeah, this does work for me. After sorting out the MCC issue (@kikisdeliveryservice thanks! I tried out the workaround there and it seems like it worked), and doing
|
Yeah, this is mostly for testing stuff out.
The issue is that when the MCC merges configs, it doesn't replace the base
00 generated from the baked in template currently, though the osImageURL part of it comes from:
Eventually, testing an OS update for hacking could be done by changing the configmap directly (or whatever we settle on in this ticket). (Or at an even higher level, pointing at a custom release payload). |
To add some context to the previous comment: this is strictly for testing changes to |
@cgwalters I just tried an update on Logs for ref: http://pastebin.test.redhat.com/696135 Add: this was run on AWS |
@cgwalters is there a specific order the MCs for the osUrlUpdates need to adhere to? Because applying a 2nd and 3rd config I'm running into errors. Must the newest config come first or last? |
Since #279 we take the first non-empty. |
Gotcha will try again with proper order for 2nd test MC to see if that was the cause of errors. |
Ok tried the second config with name: |
Have the MCC take `osImageURL` as provided by the cluster update/release payload and generate a `00-{master,worker}-osimageurl` MC from it, which ensures the MCD will update the node to it. However, we need special handling for the *initial* case where we boot into a target config, but we may be using an old OS image. Change the MCC to write the target osImageURL from the MC it uses for bootstrapping to `/etc/rhcos-initial-pivot-target`. This will then be handled by the `rhcos-initial-pivot.service` systemd unit. Closes: openshift#183
Have the MCC take `osImageURL` as provided by the cluster update/release payload and generate a `00-{master,worker}-osimageurl` MC from it, which ensures the MCD will update the node to it. However, we need special handling for the *initial* case where we boot into a target config, but we may be using an old OS image. Currently the MCD would treat this as "config drift" and go degraded. Today we write the node annotations to a file in `/etc` as part of the rendered Ignition. Use that as a "bootstrap may be required" flag, and handle it specially - if we need to pivot, do *just* that and reboot. We also clean things up by unlinking that node annotation file; after that, if the `osImageURL` drifts from the expected config, we'll go degraded, just like if someone modified a file. Closes: openshift#183
For RHCOS we have two things: - The "bootimage" (AMI, qcow2, PXE env) - The "oscontainer", now represented as `machine-os-content` in the payload For initial OpenShift releases (e.g. of the installer) ideally these are the same (i.e. we don't upgrade OS on boot). This PR aims to support injecting both data into the release payload. More information on the "bootimage" and its consumption by the installer as well as the Machine API Operator: openshift/installer#987 More information on `machine-os-content`: openshift/machine-config-operator#183
For RHCOS we have two things: - The "bootimage" (AMI, qcow2, PXE env) - The "oscontainer", now represented as `machine-os-content` in the payload For initial OpenShift releases (e.g. of the installer) ideally these are the same (i.e. we don't upgrade OS on boot). This PR aims to support injecting both data into the release payload. More information on the "bootimage" and its consumption by the installer as well as the Machine API Operator: openshift/installer#987 More information on `machine-os-content`: openshift/machine-config-operator#183
If today one wants to test an os update, here's an object you can
|
This finally landed in #426 |
I wanted to elaborate here on the current status of this. We have a PR in #363 which will finally close the loop and inject
machine-os-content
from the release payload all the way into the MachineConfig objects, which will result in the MCD updating.The final architecture will be:
New kernel errata, turns into RPM, converted into ostree then oscontainer. A bit more information on the build system side here. The oscontainer makes it into a new release payload published on quay.io.
At some point the release payload pulled down by CVO, which includes a osimageurl ConfigMap that references that container (same thing as the
machine-os-content
ImageStream). The CVO updates the ConfigMap, which you can see viaoc -n openshift-machine-config-operator get configmap/machine-config-osimageurl
The operator notices the change to the configmap and updates the "controllerconfig" which is an internal CRD that is used as the primary input to the MCC. See
oc get -o yaml controllerconfig
.The "template" sub-controller of the MCC then updates
machineconfigs/00-master
andmachineconfigs/00-worker
.The "render" sub-controller of the MCC generates new "rendered" MCs that look like
machineconfigs/master-<hash>
andmachineconfigs/worker-<hash>
and updates the MachineConfigPools to target them. For more information on this, see the MCC docs.On each node the MCD will get the new osimageurl, and if it's different than what's booted, it will pull down the container and rebase to it and reboot. This is also the same as any other config change.
The text was updated successfully, but these errors were encountered: