Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterNetworkOperator API: promote the additionalRoutingCapabilities gate #2087

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fedepaol
Copy link
Member

@fedepaol fedepaol commented Nov 6, 2024

Promoting the feature gate as the deployed daemonset is being used by metallb and covered by MetalLB tests.

This feature is baremetal specific. By setting this flag, CNO will deploy the frr-k8s daemonset which is used (among others) by core features of metallb.
Part of this work is to move the daemonset from being deployed by the MetalLB operator to being deployed by CNO.

For the time being, we have a CI lane (gating PRs) that is flipping the feature gate and it's testing MetalLB functionalities plus another one testing the FRR-K8s functionalities.

Given the feature is baremetal specific, we need an exception to the regular process, as described here https://github.com/openshift/api?tab=readme-ov-file#defining-featuregate-e2e-tests

We added a periodic informing job here plus we have both telco and openshift QE validating MetalLB releases (and this feature as a consequence).

Copy link
Contributor

openshift-ci bot commented Nov 6, 2024

Hello @fedepaol! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 6, 2024
@fedepaol
Copy link
Member Author

fedepaol commented Nov 6, 2024

Adding @asood-rh as Openshift QE and @evgenLevin and @gkopels as Telco QE

Copy link
Contributor

openshift-ci bot commented Nov 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fedepaol
Once this PR has been reviewed and has the lgtm label, please assign knobunc for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fedepaol fedepaol force-pushed the removeadvancedroutingfeaturegate branch 2 times, most recently from 8182d27 to 0f21499 Compare November 6, 2024 18:56
@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 6, 2024
@jcaamano
Copy link
Contributor

jcaamano commented Nov 7, 2024

@JoelSpeed For this error in verify-crd-schema:

error in operator/v1/zz_generated.crd-manifests/0000_70_network_01_networks-Default.crd.yaml: NoNewRequiredFields: crd/networks.operator.openshift.io version/v1 field/^.spec.additionalRoutingCapabilities.providers is new and may not be required

spec.additionalRoutingCapabilities itself is the new field and optional. Can't nested fields as spec.additionalRoutingCapabilities.providers be required?

@evgenLevin
Copy link

We have a running downstream Telco QE CI that verifies both MetalLB and frr-k8s.

@JoelSpeed
Copy link
Contributor

spec.additionalRoutingCapabilities itself is the new field and optional. Can't nested fields as spec.additionalRoutingCapabilities.providers be required?

Yes, that should be permitted, why is the schema checker thinking it isn't 🤔 Out of interest, what happens if you add // +nullable to the AdditionalRoutingCapabilities field? Does that change the output of the checker?

@JoelSpeed
Copy link
Contributor

I'm curious why the schema checker didn't pick this up before, I wonder if we have broken it somehow. Doing some investigations out of band

@fedepaol fedepaol force-pushed the removeadvancedroutingfeaturegate branch from 0f21499 to 76c7a6d Compare November 11, 2024 11:01
@fedepaol
Copy link
Member Author

@JoelSpeed just tried but still getting " ERROR: :1:59: undefined field 'additionalRoutingCapabilities' "

@jcaamano
Copy link
Contributor

@JoelSpeed just tried but still getting " ERROR: :1:59: undefined field 'additionalRoutingCapabilities' "

@fedepaol that's a different issue. Remember my suggestion of changing the validation to

// +openshift:validation:FeatureGateAwareXValidation:featureGate=RouteAdvertisements,requiredFeatureGate=RouteAdvertisements;AdditionalRoutingCapabilities,rule=...

@fedepaol fedepaol force-pushed the removeadvancedroutingfeaturegate branch 4 times, most recently from 5d782a3 to 55839fb Compare November 12, 2024 13:50
@jcaamano
Copy link
Contributor

jcaamano commented Nov 12, 2024

@JoelSpeed The nullable attribute helped but we need more help :/

Would you know why the promoted field additionalRoutingCapabilities is not added to the generated operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/RouteAdvertisements.yaml?
You can see how a test fails because of that here:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/2087/pull-ci-openshift-api-master-verify-crd-schema/1856288451670839296

error in operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/RouteAdvertisements.yaml: MustNotExceedCostBudget: ^.properties[spec]: Invalid value: apiextensions.ValidationRule{Rule:"(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'", Message:"Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available", MessageExpression:"", Reason:(*apiextensions.FieldValueErrorReason)(nil), FieldPath:"", OptionalOldSelf:(*bool)(nil)}: compilation failed: ERROR: <input>:1:5: undefined field 'additionalRoutingCapabilities'

Strangely enough this isn't failing in HEAD where the validation is in operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/AdditionalRoutingCapabilities.yaml also referencing routeAdvertisements when the field is not defined there. If it did fail, I wouldn't have known how to manage a field that needs to be enable when either of two features gates are enabled.

@jcaamano
Copy link
Contributor

@JoelSpeed The nullable attribute helped but we need more help :/

Would you know why the promoted field additionalRoutingCapabilities is not added to the generated operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/RouteAdvertisements.yaml? You can see how a test fails because of that here: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/2087/pull-ci-openshift-api-master-verify-crd-schema/1856288451670839296

error in operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/RouteAdvertisements.yaml: MustNotExceedCostBudget: ^.properties[spec]: Invalid value: apiextensions.ValidationRule{Rule:"(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'", Message:"Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available", MessageExpression:"", Reason:(*apiextensions.FieldValueErrorReason)(nil), FieldPath:"", OptionalOldSelf:(*bool)(nil)}: compilation failed: ERROR: <input>:1:5: undefined field 'additionalRoutingCapabilities'

Strangely enough this isn't failing in HEAD where the validation is in operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/AdditionalRoutingCapabilities.yaml also referencing routeAdvertisements when the field is not defined there. If it did fail, I wouldn't have known how to manage a field that needs to be enable when either of two features gates are enabled.

@deads2k you might know

@fedepaol fedepaol force-pushed the removeadvancedroutingfeaturegate branch from 4b0817f to cf06c80 Compare November 19, 2024 17:24
@fedepaol
Copy link
Member Author

/retest

@@ -137,6 +137,7 @@ type NetworkSpec struct {
// respective documentation and configuration options.
// +openshift:enable:FeatureGate=AdditionalRoutingCapabilities
// +optional
// +nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some more research and I think this is not the correct approach, please remove

I will override the no new required fields, as I think it is erroneous

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!

@JoelSpeed
Copy link
Contributor

Can you please link a sippy link in the PR description that shows us the results of your periodic that you added?

Also, can we retitle to use the word promote, rather than remove. Removing the gate is a later step, and I'd like to remove the ambiguity.

@JoelSpeed The nullable attribute helped but we need more help :/

Would you know why the promoted field additionalRoutingCapabilities is not added to the generated operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/RouteAdvertisements.yaml? You can see how a test fails because of that here: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/2087/pull-ci-openshift-api-master-verify-crd-schema/1856288451670839296

error in operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/RouteAdvertisements.yaml: MustNotExceedCostBudget: ^.properties[spec]: Invalid value: apiextensions.ValidationRule{Rule:"(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'", Message:"Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available", MessageExpression:"", Reason:(*apiextensions.FieldValueErrorReason)(nil), FieldPath:"", OptionalOldSelf:(*bool)(nil)}: compilation failed: ERROR: <input>:1:5: undefined field 'additionalRoutingCapabilities'

Strangely enough this isn't failing in HEAD where the validation is in operator/v1/zz_generated.featuregated-crd-manifests/networks.operator.openshift.io/AdditionalRoutingCapabilities.yaml also referencing routeAdvertisements when the field is not defined there. If it did fail, I wouldn't have known how to manage a field that needs to be enable when either of two features gates are enabled.

Your rule needs both the routeAdvertisements and additionalRoutingCapabilities fields, and so the rule must be gated on both, rather than featureGate=<gate>, try requiredFeatureGate=<gate>,<gate>. I believe this should mean the validation only applies when both gates are enabled within a feature set, and should resolve the issue you're seeing

@fedepaol fedepaol force-pushed the removeadvancedroutingfeaturegate branch from cf06c80 to 38852c8 Compare November 21, 2024 11:39
@fedepaol fedepaol changed the title ClusterNetworkOperator API: remove additionalRoutingCapabilities gate ClusterNetworkOperator API: promote the additionalRoutingCapabilities gate Nov 21, 2024
@@ -54,7 +54,7 @@ type NetworkList struct {

// NetworkSpec is the top-level network configuration object.
// +kubebuilder:validation:XValidation:rule="!has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.gatewayConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding) || self.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding == oldSelf.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding || self.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding == 'Restricted' || self.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding == 'Global'",message="invalid value for IPForwarding, valid values are 'Restricted' or 'Global'"
// +openshift:validation:FeatureGateAwareXValidation:featureGate=AdditionalRoutingCapabilities,rule="(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'",message="Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available"
// +openshift:validation:FeatureGateAwareXValidation:featureGate=RouteAdvertisements,requiredFeatureGate=RouteAdvertisements;AdditionalRoutingCapabilities,rule="(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'",message="Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this isn't working, all this has done is remove the validation from everywhere, damn

@@ -54,7 +54,7 @@ type NetworkList struct {

// NetworkSpec is the top-level network configuration object.
// +kubebuilder:validation:XValidation:rule="!has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.gatewayConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding) || self.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding == oldSelf.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding || self.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding == 'Restricted' || self.defaultNetwork.ovnKubernetesConfig.gatewayConfig.ipForwarding == 'Global'",message="invalid value for IPForwarding, valid values are 'Restricted' or 'Global'"
// +openshift:validation:FeatureGateAwareXValidation:featureGate=AdditionalRoutingCapabilities,rule="(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'",message="Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available"
// +openshift:validation:FeatureGateAwareXValidation:featureGate=RouteAdvertisements,requiredFeatureGate=RouteAdvertisements;AdditionalRoutingCapabilities,rule="(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'",message="Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +openshift:validation:FeatureGateAwareXValidation:featureGate=RouteAdvertisements,requiredFeatureGate=RouteAdvertisements;AdditionalRoutingCapabilities,rule="(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'",message="Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available"
// +openshift:validation:FeatureGateAwareXValidation:featureGate=RouteAdvertisements,rule="(has(self.additionalRoutingCapabilities) && ('FRR' in self.additionalRoutingCapabilities.providers)) || !has(self.defaultNetwork) || !has(self.defaultNetwork.ovnKubernetesConfig) || !has(self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements) || self.defaultNetwork.ovnKubernetesConfig.routeAdvertisements != 'Enabled'",message="Route advertisements cannot be Enabled if 'FRR' routing capability provider is not available"

… gate

Promoting the feature gate as the deployed daemonset is being used by
metallb and covered by MetalLB tests.

Signed-off-by: Federico Paolinelli <[email protected]>
@fedepaol fedepaol force-pushed the removeadvancedroutingfeaturegate branch from 38852c8 to 7f92859 Compare November 22, 2024 11:47
Copy link
Contributor

openshift-ci bot commented Nov 22, 2024

@fedepaol: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify 7f92859 link true /test verify
ci/prow/e2e-aws-ovn-hypershift 7f92859 link true /test e2e-aws-ovn-hypershift
ci/prow/e2e-aws-serial-techpreview 7f92859 link true /test e2e-aws-serial-techpreview

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JoelSpeed
Copy link
Contributor

/override ci/prow/verify-crd-schema

The new required field identified is added inside a parent optional field, so I will allow this as it's not actually a new required field and is a compatible change.

The issue being reported in the gated CRDs is an unfortunate side effect of checking CEL validations in unmerged CRDs, since the issue doesn't present in any of the merged CRDs (the ones we actually install), I'm going to override this for now. Will look into excluding these checks for unmerged CRDs

@fedepaol Next up is the sippy results to show we historical pass rates of the E2Es taht you've got set up, once we have those, we should be good to override the verify too and get this merged

Copy link
Contributor

openshift-ci bot commented Nov 25, 2024

@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/verify-crd-schema

In response to this:

/override ci/prow/verify-crd-schema

The new required field identified is added inside a parent optional field, so I will allow this as it's not actually a new required field and is a compatible change.

The issue being reported in the gated CRDs is an unfortunate side effect of checking CEL validations in unmerged CRDs, since the issue doesn't present in any of the merged CRDs (the ones we actually install), I'm going to override this for now. Will look into excluding these checks for unmerged CRDs

@fedepaol Next up is the sippy results to show we historical pass rates of the E2Es taht you've got set up, once we have those, we should be good to override the verify too and get this merged

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants