-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for default Ingress Cert #1022
Add support for default Ingress Cert #1022
Conversation
04859e7
to
2386ca3
Compare
apis/infrastructure/v1/cert_types.go
Outdated
// +kubebuilder:default=SelfSigned | ||
// * DefaultIngress: Default ingress certificate configured for OpenShift | ||
// +kubebuilder:validation:Enum=SelfSigned;Provided;DefaultIngress | ||
// +kubebuilder:default=DefaultIngress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we change the default value, it can cause problem for these users did not set it in old release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be okay. By default users will use default certs. I guess we need to document this change. @israel-hdez Can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes defaultIngress would be the default one for users. If we document it, it would be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for users who already installed kserve, from my understanding, the dci is not updated automatically when they upgraded, right? If so, we can provide a way how to use DefaultIngress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, if there is an existing DSCI/DSC, it won't be updated. Currently this has been introduced in dsc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existent installs will have this field populated with the previous default (we are not using pointers). Thus, a new default here would not hurt.
(I suspect that's the reason why the sample has it?
opendatahub-operator/config/samples/datasciencecluster_v1_datasciencecluster.yaml
Line 24 in df469bb
type: SelfSigned |
766cd78
to
b827fcc
Compare
/lgtm |
@Jooho @israel-hdez Is there any changes that we need in odh-model-controller? |
/hold until Edgar's reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shared a few minor comments in the code itself, but there's one big thing we should also address - I don't see a reason why new cert/secret logic should be coupled with Feature
struct. I provided a patch you could apply, as this is certainly way too much to comment (even with suggestions).
PTAL at this patch.
Happy to explain the details. The original code (CreateSelfSignedCertificate
) was clearly a missed opportunity in the original PR so I felt responsible for cleaning it up :)
Besides that, a couple of tests would really help in gaining confidence that all of it will work as intended. WDYT about adding a few e2e tests covering this as part of this PR?
apis/infrastructure/v1/cert_types.go
Outdated
Provided CertType = "Provided" | ||
SelfSigned CertType = "SelfSigned" | ||
Provided CertType = "Provided" | ||
DefaultIngress CertType = "DefaultIngress" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this is specific for openshift ingress, so perhaps it's better to give a more accurate name such as OpenshiftDefaultIngress
? WDYT?
if a.GetObjectKind().GroupVersionKind().Kind == "CustomResourceDefinition" || a.GetName() == "ArgoWorkflowCRD" { | ||
return []reconcile.Request{{ | ||
NamespacedName: types.NamespacedName{Name: requestName}, | ||
}} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I grok how this is related. Should it deserve separate commit/PR if it's fixing something for a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also used in the predicate a bit further down with datasciencepipelines.ArgoWorkflowCRD
const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just wanted to reuse code for request name so added a function r.getRequestName()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This almost worked.
The missing part I identify is to set the right secret name in the Knative gateway. As it is, if the user does not specify the SecretName, a default name will be set. The name is set here:
f.Spec.KnativeCertificateSecret = certificateSecretName |
...related template:
opendatahub-operator/components/kserve/resources/servicemesh/routing/istio-ingress-gateway.tmpl.yaml
Line 17 in df469bb
credentialName: {{ .KnativeCertificateSecret }} |
I can think two ways of doing it:
- Modify that
ServingDefaultValues
so that it will use the name of the copied secret..... Although, looks likeServingDefaultValues
func runs before the secret is copied, so it is hard to know what's going to be its name. So... IDK. - When copying the secret, use
f.Spec.KnativeCertificateSecret
contents as the name of the copied secret.
Without that change, models are failing to start on my trials. But once the right secret is configured in the Gateway, it is possible to validate these changes by:
- Installing the operator.
- Installing KServe using the DSC.
- Deploying any sample model.
- Get/download the public certificate that the URL of the deployed model is exposing (hint: https://stackoverflow.com/a/7886248).
- Get/download the public certificate of some other Route (e.g. the OpenShift Console URL).
- Compare both downloaded certificates and check that they are the same.
apis/infrastructure/v1/cert_types.go
Outdated
// +kubebuilder:default=SelfSigned | ||
// * DefaultIngress: Default ingress certificate configured for OpenShift | ||
// +kubebuilder:validation:Enum=SelfSigned;Provided;DefaultIngress | ||
// +kubebuilder:default=DefaultIngress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existent installs will have this field populated with the previous default (we are not using pointers). Thus, a new default here would not hurt.
(I suspect that's the reason why the sample has it?
opendatahub-operator/config/samples/datasciencecluster_v1_datasciencecluster.yaml
Line 24 in df469bb
type: SelfSigned |
@@ -436,26 +437,17 @@ func (r *DataScienceClusterReconciler) SetupWithManager(mgr ctrl.Manager) error | |||
Watches(&source.Kind{Type: &corev1.ConfigMap{}}, handler.EnqueueRequestsFromMapFunc(r.watchDataScienceClusterResources), builder.WithPredicates(configMapPredicates)). | |||
Watches(&source.Kind{Type: &apiextensionsv1.CustomResourceDefinition{}}, handler.EnqueueRequestsFromMapFunc(r.watchDataScienceClusterResources), | |||
builder.WithPredicates(argoWorkflowCRDPredicates)). | |||
Watches(&source.Kind{Type: &corev1.Secret{}}, handler.EnqueueRequestsFromMapFunc(r.watchDefaultIngressSecret), builder.WithPredicates(defaultIngressCertSecretPredicates)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fully sure how Watches
work. But it is worth to check that you won't fall under the issue described here: https://issues.redhat.com/browse/RHOAIENG-1006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this watch was added to reconcile operator when a certificate is expired and secret gets deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just need to be careful. We've been seeing OOMKills with clusters with many secrets.
config/rbac/role.yaml
Outdated
- delete | ||
- get | ||
- list | ||
- patch | ||
- watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we omit delete and patch?
pkg/feature/cert.go
Outdated
client.InNamespace("openshift-ingress-operator"), | ||
} | ||
|
||
err := cli.List(context.TODO(), defaultIngressCtrlList, listOpts...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to directly get the default one? (https://docs.openshift.com/container-platform/4.15/networking/ingress-operator.html#nw-ingress-view_configuring-ingress)
I think we want to always use the default one. Getting a list of the default ones may not be deterministic, and this may choose a different one if the user has created their own ingresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was not sure if there would be an edge case where the default
is named as something else
b827fcc
to
968a0ad
Compare
it does not have to be part of Feature struct/methods Signed-off-by: bartoszmajsak <[email protected]>
08ba557
to
6a6c7a7
Compare
@zdtsw @israel-hdez @bartoszmajsak I have added e2e tests and also addressed all the comments. |
/test opendatahub-operator-e2e |
/test opendatahub-operator-e2e |
/lgtm |
ea81e86
into
opendatahub-io:incubation
* Add support for default ingress cert * Refactor watches * Add Watch for Ingress cert secret * Update manifests * Fix linters and api docs * Update scheme and roles * Address comments * Update tests to match default values * chore: refactors cert creation it does not have to be part of Feature struct/methods Signed-off-by: bartoszmajsak <[email protected]> * Change default value to OpenshiftDefaultIngress * Update default secret * Add e2e tests * Update tests for Managed Serving * Update serverless tests * Sync manifests
PR opendatahub-io#1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
PR opendatahub-io#1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
PR #1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
…#1067) PR opendatahub-io#1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
* Add support for default ingress cert * Refactor watches * Add Watch for Ingress cert secret * Update manifests * Fix linters and api docs * Update scheme and roles * Address comments * Update tests to match default values * chore: refactors cert creation it does not have to be part of Feature struct/methods Signed-off-by: bartoszmajsak <[email protected]> * Change default value to OpenshiftDefaultIngress * Update default secret * Add e2e tests * Update tests for Managed Serving * Update serverless tests * Sync manifests
…#1067) PR opendatahub-io#1022 promoted knative default secret name as the default for entire platform. This can be confusing, therefore this change makes it local to kserve again.
Jira Issue: https://issues.redhat.com/browse/RHOAIENG-4221
Description
How Has This Been Tested?
Merge criteria: