Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for default Ingress Cert #1022

Merged

Conversation

VaishnaviHire
Copy link
Member

@VaishnaviHire VaishnaviHire commented May 22, 2024

Jira Issue: https://issues.redhat.com/browse/RHOAIENG-4221

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

// +kubebuilder:default=SelfSigned
// * DefaultIngress: Default ingress certificate configured for OpenShift
// +kubebuilder:validation:Enum=SelfSigned;Provided;DefaultIngress
// +kubebuilder:default=DefaultIngress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we change the default value, it can cause problem for these users did not set it in old release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be okay. By default users will use default certs. I guess we need to document this change. @israel-hdez Can you confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes defaultIngress would be the default one for users. If we document it, it would be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for users who already installed kserve, from my understanding, the dci is not updated automatically when they upgraded, right? If so, we can provide a way how to use DefaultIngress.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, if there is an existing DSCI/DSC, it won't be updated. Currently this has been introduced in dsc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existent installs will have this field populated with the previous default (we are not using pointers). Thus, a new default here would not hurt.

(I suspect that's the reason why the sample has it?

)

pkg/feature/cert.go Outdated Show resolved Hide resolved
pkg/feature/cert.go Outdated Show resolved Hide resolved
pkg/feature/serverless/resources.go Show resolved Hide resolved
@VaishnaviHire VaishnaviHire changed the title [WIP] Add support for default Ingress Cert Add support for default Ingress Cert May 23, 2024
@VaishnaviHire VaishnaviHire force-pushed the add_cert_type branch 2 times, most recently from 766cd78 to b827fcc Compare May 23, 2024 18:26
@VaishnaviHire VaishnaviHire requested a review from Jooho May 28, 2024 16:45
@Jooho
Copy link
Contributor

Jooho commented May 29, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 29, 2024
@Jooho Jooho removed their assignment May 29, 2024
@VaishnaviHire
Copy link
Member Author

@Jooho @israel-hdez Is there any changes that we need in odh-model-controller?

@VaishnaviHire
Copy link
Member Author

/hold until Edgar's reviews

Copy link
Contributor

@bartoszmajsak bartoszmajsak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shared a few minor comments in the code itself, but there's one big thing we should also address - I don't see a reason why new cert/secret logic should be coupled with Feature struct. I provided a patch you could apply, as this is certainly way too much to comment (even with suggestions).

PTAL at this patch.

Happy to explain the details. The original code (CreateSelfSignedCertificate) was clearly a missed opportunity in the original PR so I felt responsible for cleaning it up :)

Besides that, a couple of tests would really help in gaining confidence that all of it will work as intended. WDYT about adding a few e2e tests covering this as part of this PR?

Provided CertType = "Provided"
SelfSigned CertType = "SelfSigned"
Provided CertType = "Provided"
DefaultIngress CertType = "DefaultIngress"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this is specific for openshift ingress, so perhaps it's better to give a more accurate name such as OpenshiftDefaultIngress? WDYT?

Comment on lines +465 to 483
if a.GetObjectKind().GroupVersionKind().Kind == "CustomResourceDefinition" || a.GetName() == "ArgoWorkflowCRD" {
return []reconcile.Request{{
NamespacedName: types.NamespacedName{Name: requestName},
}}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I grok how this is related. Should it deserve separate commit/PR if it's fixing something for a reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also used in the predicate a bit further down with datasciencepipelines.ArgoWorkflowCRD const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I just wanted to reuse code for request name so added a function r.getRequestName()

Copy link
Contributor

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This almost worked.
The missing part I identify is to set the right secret name in the Knative gateway. As it is, if the user does not specify the SecretName, a default name will be set. The name is set here:

f.Spec.KnativeCertificateSecret = certificateSecretName

...related template:

I can think two ways of doing it:

  • Modify that ServingDefaultValues so that it will use the name of the copied secret..... Although, looks like ServingDefaultValues func runs before the secret is copied, so it is hard to know what's going to be its name. So... IDK.
  • When copying the secret, use f.Spec.KnativeCertificateSecret contents as the name of the copied secret.

Without that change, models are failing to start on my trials. But once the right secret is configured in the Gateway, it is possible to validate these changes by:

  • Installing the operator.
  • Installing KServe using the DSC.
  • Deploying any sample model.
  • Get/download the public certificate that the URL of the deployed model is exposing (hint: https://stackoverflow.com/a/7886248).
  • Get/download the public certificate of some other Route (e.g. the OpenShift Console URL).
  • Compare both downloaded certificates and check that they are the same.

// +kubebuilder:default=SelfSigned
// * DefaultIngress: Default ingress certificate configured for OpenShift
// +kubebuilder:validation:Enum=SelfSigned;Provided;DefaultIngress
// +kubebuilder:default=DefaultIngress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existent installs will have this field populated with the previous default (we are not using pointers). Thus, a new default here would not hurt.

(I suspect that's the reason why the sample has it?

)

@@ -436,26 +437,17 @@ func (r *DataScienceClusterReconciler) SetupWithManager(mgr ctrl.Manager) error
Watches(&source.Kind{Type: &corev1.ConfigMap{}}, handler.EnqueueRequestsFromMapFunc(r.watchDataScienceClusterResources), builder.WithPredicates(configMapPredicates)).
Watches(&source.Kind{Type: &apiextensionsv1.CustomResourceDefinition{}}, handler.EnqueueRequestsFromMapFunc(r.watchDataScienceClusterResources),
builder.WithPredicates(argoWorkflowCRDPredicates)).
Watches(&source.Kind{Type: &corev1.Secret{}}, handler.EnqueueRequestsFromMapFunc(r.watchDefaultIngressSecret), builder.WithPredicates(defaultIngressCertSecretPredicates)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully sure how Watches work. But it is worth to check that you won't fall under the issue described here: https://issues.redhat.com/browse/RHOAIENG-1006

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this watch was added to reconcile operator when a certificate is expired and secret gets deleted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just need to be careful. We've been seeing OOMKills with clusters with many secrets.

Comment on lines 1072 to 1096
- delete
- get
- list
- patch
- watch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we omit delete and patch?

client.InNamespace("openshift-ingress-operator"),
}

err := cli.List(context.TODO(), defaultIngressCtrlList, listOpts...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to directly get the default one? (https://docs.openshift.com/container-platform/4.15/networking/ingress-operator.html#nw-ingress-view_configuring-ingress)

I think we want to always use the default one. Getting a list of the default ones may not be deterministic, and this may choose a different one if the user has created their own ingresses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was not sure if there would be an edge case where the default is named as something else

@VaishnaviHire
Copy link
Member Author

@zdtsw @israel-hdez @bartoszmajsak I have added e2e tests and also addressed all the comments.

@VaishnaviHire
Copy link
Member Author

/test opendatahub-operator-e2e

@zdtsw
Copy link
Member

zdtsw commented Jun 19, 2024

/test opendatahub-operator-e2e

@zdtsw
Copy link
Member

zdtsw commented Jun 19, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jun 19, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit ea81e86 into opendatahub-io:incubation Jun 19, 2024
8 checks passed
zdtsw pushed a commit to zdtsw-forking/rhods-operator that referenced this pull request Jun 19, 2024
* Add support for default ingress cert

* Refactor watches

* Add Watch for Ingress cert secret

* Update manifests

* Fix linters and api docs

* Update scheme and roles

* Address comments

* Update tests to match default values

* chore: refactors cert creation

    it does not have to be part of Feature struct/methods
Signed-off-by: bartoszmajsak <[email protected]>

* Change default value to OpenshiftDefaultIngress

* Update default secret

* Add e2e tests

* Update tests for Managed Serving

* Update serverless tests

* Sync manifests
bartoszmajsak added a commit to bartoszmajsak/opendatahub-operator that referenced this pull request Jun 20, 2024
PR opendatahub-io#1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
bartoszmajsak added a commit to bartoszmajsak/opendatahub-operator that referenced this pull request Jun 20, 2024
PR opendatahub-io#1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
openshift-merge-bot bot pushed a commit that referenced this pull request Jun 20, 2024
PR #1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
zdtsw pushed a commit to zdtsw-forking/rhods-operator that referenced this pull request Jun 26, 2024
…#1067)

PR opendatahub-io#1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
openshift-merge-bot bot pushed a commit to red-hat-data-services/rhods-operator that referenced this pull request Jun 26, 2024
* Add support for default ingress cert

* Refactor watches

* Add Watch for Ingress cert secret

* Update manifests

* Fix linters and api docs

* Update scheme and roles

* Address comments

* Update tests to match default values

* chore: refactors cert creation

    it does not have to be part of Feature struct/methods
Signed-off-by: bartoszmajsak <[email protected]>

* Change default value to OpenshiftDefaultIngress

* Update default secret

* Add e2e tests

* Update tests for Managed Serving

* Update serverless tests

* Sync manifests
openshift-merge-bot bot pushed a commit to red-hat-data-services/rhods-operator that referenced this pull request Jun 26, 2024
…#1067)

PR opendatahub-io#1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants