-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-1443: Promote onclusterbuild to GA #2090
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
Hello @yuqi-zhang! Some important instructions when contributing to openshift/api: |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yuqi-zhang The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all |
8465feb
to
2ca4842
Compare
/test all |
First step to GA'ing the currently v1alpha1 APIs. Don't add to payload manifests yet, and the featuregate is retained.
2ca4842
to
4c9f154
Compare
@yuqi-zhang: This pull request references MCO-1443 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we take this promotion opportunity to go through the API thoroughly and improve validations and godocs please
@@ -0,0 +1,64 @@ | |||
apiVersion: apiextensions.k8s.io/v1 # Hack because controller-gen complains if we don't have this | |||
name: "[TechPreview] MachineOSBuild" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer tech preview? I know it kind of is, but, we aren't likely to remember to update this when we update the gate, so this will become disjoint
@@ -0,0 +1,134 @@ | |||
apiVersion: apiextensions.k8s.io/v1 # Hack because controller-gen complains if we don't have this | |||
name: "[TechPreview] MachineOSConfig" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer TP
// version tracks the newest MachineOSBuild for each MachineOSConfig | ||
// +kubebuilder:validation:Minimum=1 | ||
// +kubebuilder:validation:Required | ||
Version int64 `json:"version"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following how this interoperates, in particular, with ConfigGeneration? Can you remind me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original intent was that they track a corresponding MachineOSConfig generation to do rebuilds, but in techpreview the implementation hasn't leveraged either field (hard set to 1). Let me check that and get back to you.
// host[:port][/namespace]/name:<tag> or svc_name.namespace.svc[:port]/repository/name:<tag> | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=447 | ||
// +kubebuilder:validation:XValidation:rule=`((self.split(':').size() == 2 && self.split(':')[1].matches('^([a-zA-Z0-9-./:])+$')) || self.matches('^[^.]+\\.[^.]+\\.svc:\\d+\\/[^\\/]+\\/[^\\/]+:[^\\/]+$'))`,message="the OCI Image reference must end with a valid :<tag>, where '<digest>' is 64 characters long and '<tag>' is any valid string Or it must be a valid .svc followed by a port, repository, image name, and tag." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message here references digest, but it's not in the example?
// +listMapKey=type | ||
// +optional | ||
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` | ||
// ImageBuilderType describes the image builder set in the MachineOSConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Godoc for this field is wrong, it's talking about a different field name?
// +kubebuilder:default:=noarch | ||
// +optional | ||
ContainerfileArch ContainerfileArch `json:"containerfileArch"` | ||
// content is the custom content to be built |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this? Can we expand this documentation?
|
||
type MachineOSImageBuilder struct { | ||
// imageBuilderType specifies the backend to be used to build the image. | ||
// +kubebuilder:default:=PodImageBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want this defaulted? What if we need to change this in the future as we decide there's a better, default image build method that doesn't rely on today's pod based image builder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it difficult to change the default here in the API if we wanted to change it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you have a default inside the API, you have to consider that this value will be written out to disk, and so, any existing instance of this API, will get this value and be stuck with it. If you wanted to, say, change blanketly even those instances that are already existing, then this makes it very difficult to change in the future.
We typically have a patten of "when omitted, this means no opinion, and the platform is left to choose a reasonable default, which is subject to change over time", which may be a better fit if you want a default here.
That said, I think the latest version of the API requires the user to specify this explicitly, which I think makes this a moot point
// +kubebuilder:validation:XValidation:rule=`((self.split(':').size() == 2 && self.split(':')[0].matches('^([a-zA-Z0-9-]+\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?/([a-zA-Z0-9-_]{0,61}/)?[a-zA-Z0-9-_.]*?$')) || self.matches('^[^.]+\\.[^.]+\\.svc:\\d+\\/[^\\/]+\\/[^\\/]+:[^\\/]+$'))`,message="the OCI Image name should follow the host[:port][/namespace]/name format, resembling a valid URL without the scheme. Or it must be a valid .svc followed by a port, repository, image name, and tag." | ||
// +kubebuilder:validation:Required | ||
RenderedImagePushspec string `json:"renderedImagePushspec"` | ||
// releaseVersion is associated with the base OS Image. This is the version of Openshift that the Base Image is associated with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line almost says the same thing twice, perhaps this should consolidate into one?
60085e1
to
416f317
Compare
Updated based on comments: Also temporarily reverted using format.dns1123subdomain() library while figuring out how it works |
a663367
to
8f72efb
Compare
Mostly fixups, with some minor changes to the v1alpha1 API: - Removed Version and ConfigGeneration from MOSB as they were unused - Updated relatedobjects list - Changed all optional,omitempty structs to pointers - Removed default for ImageBuilderType, but keeping default build arch to noarch as we don’t foresee changing that. - Fixed RenderedImagePushspec validators to match description
8f72efb
to
c6619bf
Compare
/test e2e-aws-serial-techpreview |
// describes that the machine-os-builder will use a custom pod builder that uses buildah | ||
PodBuilder MachineOSImageBuilderType = "PodImageBuilder" | ||
// describes that the machine-os-builder will use a Job to spin up a custom pod builder that uses buildah | ||
PodBuilder MachineOSImageBuilderType = "JobImageBuilder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be renamed to JobBuilder
as well?
// +unionMember | ||
// +optional | ||
PodImageBuilder *ObjectReference `json:"buildPod,omitempty"` | ||
JobImageBuilderStatus *ObjectReference `json:"jobImageBuilderStatus,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that BuilderStatus
makes sense here because this is mostly just a reference to the build executor which doesn't include any kind of status message (to my knowledge). Maybe something like JobImageBuilderRef
or something like that would be more appropriate? Better yet, this could just be ImageBuilderRef
. That, coupled with the value in ImageBuilderType
, should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment above this field might need an update to reflect that it is a reference as well.
Stepping back a bit, I'm guessing the reason we are using a union here is we could have different builders, and they could need different types(other than ObjectReference
) to represent their reference? If we are always going to use ObjectReference
for multiple kind of builders, then maybe we don't even need to have to a union discriminator? It could be just two fields within this struct as Zack suggested: type and ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those values are arguably common to all image builders. There could be some additional information that may be desired specific to each type of image builder, but we can add that onto this later as a specific type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And taking a step back from this, if the ObjectReference
includes GVK (Group Version Kind) information, we could potentially eliminate the ImageBuilderType
field here since I see that as being primarily user-facing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so instead of having individual ImageBuilderRef's, I'll just make it one optional field here to encompass all potential object references.
7597202
to
ddb8b4e
Compare
ddb8b4e
to
dd3b585
Compare
dd3b585
to
44656cf
Compare
// +kubebuilder:validation:Required | ||
BaseImagePullSecret ImageSecretObjectReference `json:"baseImagePullSecret"` | ||
// must live in the openshift-machine-config-operator namespace if provided. | ||
// defaults to using the cluster-wide pull secret if not specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the default cluster wide pull-secret specified? Is this something users need to be aware of or is this something that is always set up and available?
Are there any security implications of relying on the default pull secret that we need to be aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster pull secret is the one in openshift-config
namespace (just a secret called pull-secret). There shouldn't be any security implications - this is widely used, and is on-disk for every node as well as referenced in various MCO objects, and doesn't give any push access in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe this should read something along the lines of, though, I'm not sure what registries it actually has read access to, does it have access to the in-cluster registry?
// defaults to using the cluster-wide pull secret if not specified. | |
// Defaults to using the cluster-wide pull secret, that was provided at install time. | |
// The default secret will provide read only access to registries <such as in cluster? RH owned? I don't know what this actually gives>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the description in the v1 API. It does not, but we expect users to use the base shipped payload RHCOS image as the base image most of the time, so it should not be necessary.
// must live in the openshift-machine-config-operator namespace if provided. | ||
// defaults to using the cluster-wide pull secret if not specified. | ||
// +optional | ||
BaseImagePullSecret ImageSecretObjectReference `json:"baseImagePullSecret,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImageSecretObjectReference
is a struct right? To actually omit this, it needs to be a pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, will update that as well. I originally updated the v1alpha1 API first via the first commit before updated it all to the correct version in v1. I don't think this is technically needed anymore?
Basically the v1 API has all the updates from comments, and we will remove the v1alpha1 API after we switch all references, hence why I haven't updated the v1alpha1 API. Is that generally the expected pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah generally you can leave the v1alpha1 API alone, I just checked it as you had made changes in it. You shouldn't need to add this here if it's being added strictly to the new v1 API
type MachineOSBuildStatus struct { | ||
// conditions are state related conditions for the build. Valid types are: | ||
// Prepared, Building, Failed, Interrupted, and Succeeded | ||
// once a Build is marked as Failed, no future conditions can be set. This is enforced by the MCO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could enforce that using CEL, IIUC, a condition marking this as failed, makes the list immutable?
How is it enforced by MCO today?
// +kubebuilder:validation:XValidation:rule="self.exists(x, x.type == 'Failed') ? self == oldSelf : true",message="once a Failed condition is set, conditions are immutable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expectation is that only the MCO will be creating MachineOSBuild objects (so the user cannot directly create a build like that, they must create a config from which build objects are created by the MCO as it does the builds). In that sense, this status is monitored and updated by the MCO.
Do you think it's strictly necessary given the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would still add the validation. Users have access to APIs, and may do unexpected things. Controllers also have bugs, that make them do weird things. The more we can put into the API validation to make sure it meets our expectations, the better, even in status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'd also like to point out that Interrupted
should also be considered an immutable state.
// +kubebuilder:validation:MaxLength=447 | ||
// +kubebuilder:validation:XValidation:rule=`self.matches('^([a-zA-Z0-9-]+\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?(/[a-zA-Z0-9-_]{1,61})*/[a-zA-Z0-9-_.]+:[a-zA-Z0-9._-]+$') || self.matches('^[^.]+\\.[^.]+\\.svc:\\d+\\/[^\\/]+\\/[^\\/]+:[^\\/]+$')`,message="the OCI Image name should follow the host[:port][/namespace]/name format, resembling a valid URL without the scheme. Or it must be a valid .svc followed by a port, repository, image name, and tag." | ||
// +kubebuilder:validation:Required | ||
RenderedImagePushspec string `json:"renderedImagePushspec"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a Pushspec
a thing or should this be PushSpec
?
Thanks for the thorough review. Addressed most of the comments, left a few questions, and will follow up on the 3 points above. I see the o/k PR for adding the format library has merged. Will try to incorporate that into the next set of changes as well. |
- Update from PodImageBuilder to JobImageBuilder, and add a MachineOSBuild reference to MachineOSConfig - Failed and Interrupted will now cause MOSBuild conditions to be immutable - Updated Arch enum to be PascalCase - Updated relatedObject go doc based on suggestion - Add validation for buildEnd > buildStart - Removed conditions field from MOSConfig. The build object is supposed to reflect conditions instead, so this is not needed at this time - Use dns1123 format check for all strings that match, and otherwise switch pattern checks to validation rules where appropriate - Updated godocs a bit more for formatting
0741c21
to
7ff1e4d
Compare
Pushed another set of updates:
Tested on the latest 4.18 nightly and can confirm both CRDs are valid with the new format library |
machineOSConfig: | ||
name: worker | ||
renderedImagePushspec: quay.io/mco/renderedImage:latest | ||
onCreate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the second onCreate
here
buildOutputs: | ||
currentImagePullSecret: | ||
name: foo | ||
onCreate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove additional onCreates
, only expecting one in the file
name: worker | ||
buildInputs: | ||
imageBuilder: | ||
imageBuilderType: JobImageBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want the repetition of imageBuilder
and imageBuilderType
? And in the enum value as well?
imageBuilder:
type: Job
renderedImagePushSecret: | ||
name: foo | ||
renderedImagePushspec: quay.io/mco/renderedImg:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push stuff feels more like an output to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user should be specifying where to push to here, so it is user input
buildOutputs: | ||
currentImagePullSecret: | ||
name: foo | ||
expectedError: "Invalid value: \"string\": the OCI Image name should follow the host[:port][/namespace]/name format, resembling a valid URL without the scheme" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you include the earlier part of the error that shows which field this pertains to, that way it's easier to find the broken field when reviewing this later
} | ||
|
||
// BuildInputs holds all of the information needed to trigger a build | ||
type BuildInputs struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user of this API, how would I know what a BaseOSExtensions vs BaseOS vs Base image means?
I'm wondering if we can clean this API up a bit, there's a lot of, what appears to a naive eye at least, redundancy?
images:
base: <>
baseOS: <>
baseOSExtensions: <>
pullSecret:
name: <>
pushSecret:
name: <>
builder:
type: Job
job:
name: <>
pushSpec: <>
releaseVersion: <>
containerFiles: []
Or if we moved up to spec, some of this might look like
sourceImages:
pullSecret: <> # For pulling these images
base: <>
baseOS: <>
baseOSExtensions: <>
buildOutputs:
pullSecret:
name: <> # For pulling the pushed image
pushSecret:
name: <> # For pushing the built image
imagePushSpec: <>
builder:
type: Job
job:
name: <>
releaseVersion: <>
containerFiles: []
// containerfileArch describes the architecture this containerfile is to be built for. | ||
// This arch is optional. If the user does not specify an architecture, it is assumed | ||
// that the content can be applied to all architectures, or in a single arch cluster: the only architecture. | ||
// +kubebuilder:validation:Enum:=ARM64;AMD64;PPC64LE;S390X;AArch64;x86_64;NoArch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have these duplicates? That means an end user can configure the same meaning, in multiple ways, that's confusing, we should avoid that
// +kubebuilder:validation:Enum:=ARM64;AMD64;PPC64LE;S390X;AArch64;x86_64;NoArch | |
// +kubebuilder:validation:Enum:=ARM64;AMD64;PPC64LE;S390X;NoArch |
// containerFile describes the custom data the user has specified to build into the image. | ||
// This is also commonly called a Dockerfile and you can treat it as such. The content is the content of your Dockerfile. | ||
// See https://github.com/containers/common/blob/main/docs/Containerfile.5.md for the spec reference. | ||
// you can specify up to 7 containerFiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean to specify multiple images? Why would an end user want to do that?
CurrentImagePullSecret *ImageSecretObjectReference `json:"currentImagePullSecret,omitempty"` | ||
} | ||
|
||
type MachineOSImageBuilder struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this struct is used, it's called something like imageBuilder
, maybe we avoid the repetition here and go for just type
for the field within here?
// the MachineOSBuilder pod validates that the user has provided a valid pool | ||
type MachineConfigPoolReference struct { | ||
// name of the MachineConfigPool object. | ||
// Must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should update this to match the others I suggested, including the at most 253 characters part
Mostly fixes around validation and godocs. Added some additional test cases.
@yuqi-zhang: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Opened for testing. Based on guidance this is currently the first step: create new v1 API, gate remains off, v1 API is excluded from the image manifests
Also adds in: #2089