-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ODH-ADR-Operator-0005: Whitelist component fields for user customizations #32
Conversation
customization: | ||
- name: kserve-controller-manager | ||
resources: | ||
requests: | ||
cpu: 100m | ||
memory: 128Mi | ||
limits: | ||
cpu: 200m | ||
memory: 256Mi | ||
- name: odh-model-controller | ||
resources: | ||
requests: | ||
cpu: 100m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want this to be in the "default" config, e.g when create default-dsc from a clean installation or the template "Yaml Form" should set the default request values.
biggest concern is, user will need to know exactly the name of the deployment, i.e kserve-controller-manager in order to make this logic work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this will not be in default config. Default resources will be set to the one mentioned in manifests. This is an optional config to be added by users when required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading this ADR, the main question I have is:
Why do we reconcile everything back to the default state?
Shouldn't we treat the components' manifests as a sort of "reasonable" default and allow users to adjust them according to their needs? KfDef version (v1) might have been too liberal in this regard, as it merely checked if a given resource already existed and skipped applying if that was the case. I understand that we want to have control over certain aspects, but this proposal suggests we might have gone too far with this approach.
Perhaps we should start considering what changes users can make (or define what is not changeable) to the resources we create and allow them to do so.
- name: kserve-controller-manager | ||
resources: | ||
requests: | ||
cpu: 100m | ||
memory: 128Mi | ||
limits: | ||
cpu: 200m | ||
memory: 256Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the previous #23 ADR, I am against this type of API change. By doing that we are leaking implementation details of RHOAI components to the rather abstract concept of DSC and Components. Moreover, unless it's a free-form field (e.g. raw extension), we are locking API to the current state of component implementation, making it tricky with changes moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree with the deployment name being exposed. Alternatively we can just get resource configuration and apply it to all deployments for a component. However that may cause issues with components having multiple deployments/controllers
ODH component deployments lack a mechanism for customizing resource limits and requests. Deployment resources are hard-coded in the manifests with no means to adjust them according to user requirements. | ||
We need a mechanism for users to configure resources when available resources are limited. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add concrete use cases that need this functionality so that we can understand the rationale better?
I am aware of too many replicas of dashboard deployment (which in general I am wondering why we have so high in the first place). What are the others in e.g. KServe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah @israel-hdez Can you elaborate on the Kserve usecase for resource configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we have this issue in upstream KServe: kserve/kserve#3467.
We have applied a workaround by adding more resources in ODH manifests here to unblock some people on using KServe: opendatahub-io/kserve#261.
Despite a fix is already in progress, these kind of workarounds would not require users to wait for a release if we could configure resource limits/requests. And, well... I think users should be able to customize the resource limits, based on their expected number of resources.
I also barely remember there were other similar instances in other model serving components -- so, this is not KServe specific.
- name: kserve-controller-manager | ||
resources: | ||
requests: | ||
cpu: 100m | ||
memory: 128Mi | ||
limits: | ||
cpu: 200m | ||
memory: 256Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the previous #23 ADR, I am against this type of API change. By doing that we are leaking implementation details of RHOAI components to the rather abstract concept of DSC and Components. Moreover, unless it's a free-form field (e.g. raw extension), we are locking API to the current state of component implementation, making it tricky with changes moving forward.
I do agree with letting users update the resources, I am just not sure how we can limit the scope of the updates. So if we do not reconcile to the desired state, there can be usecase when users do multiple config updates and result in an error state. Maybe we can add a way for user to return to defaults if they are in such error state? Kfdef actually provided no way to customize, it used to reconcile everything back to the default state. |
Maybe we could find a simple way to define paths/fields of a particular resource that we will skip during reconciliation. We can for example remove such a field from the patch operation when applying the original, "desired" state. Or flipping this around - reconciliation will apply a patch with only the fields that we really want to make sure are not changed.
I am not sure about this one. My line of thinking is more of what kind of changes would result in "unsupported configuration". For example, we might want to make sure our images are always reconciled. But things like resources are very dependent on the target env and usage patterns so this could be skipped from reconciliation.
I have only worked on the plugin part of KfDef and in there when you were creating so-called "raw resources" it was skipping apply during reconciliation when the resource already existed. So I stand corrected :) OTOH it lets you provide your own overlays to customize the kustomize :) I am not saying this is the way to go, but it was possible without blowing up the |
I know we can go this way to be quite flexible. However, I'm more on the side that it is the operator CRDs that should expose the needed settings, rather than instructing the users to do it themselves. IMHO, the more we allow for such flexibility, the more the operator becomes less valuable -- i.e. why not just provide the manifests to the user? Personally, I'm quite sold to the explanation in operator framework website:
So, the general idea that this ADR is proposing is good to me. |
cpu: 200m | ||
memory: 256Mi | ||
- name: odh-model-controller | ||
resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The odh-model-controller
is weird, as this is shared between ModelMesh and KServe.
It starts to become more problematic.
Treating a component as a whole sounds like the right thing to do, I am not opposing that, nor do I suggest going back to "using primitives". I fear that CRD can become bloated with different levels of abstraction. That's why I started thinking that perhaps we can open up certain aspects of our building blocks for the users to fine-tune instead of exposing those in our CRDs. Is the suggested approach based on the assumption that the "user" is someone not knowledgable in the k8s domain, so instead of letting them fine-tune certain "low-level" settings, we are going to expose those low-level settings in some other place? I wonder if by doing so aren't we exactly exposing knobs? |
But, then, aren't these building blocks the primitives? I think this confuses me.
On this point I agree with you that the different high/low level settings look weird. However, I do think resource management is still within the scope of an operator. IMO, it depends on how much an operator is planned to evolve on its capability level. A level-1 operator would require users to tune these low level settings, while a level-5 operator would relief the user from resource tuning. AFAIK, we don't have the data to automate the resource tuning, so IMO we need the user to do it for now. I think this is quite clear, and the discussion seems to be around on how the user is supposed to do it. Maybe, a middle ground is to expose in the CRD a high level setting like |
Will this still require us to expose component specific details?
@israel-hdez However I think this will require us to have data for minimal resources required by a component. |
My rationale tells me yes.
I'm not sure overlays are suitable, because resources are something can need fine-tuning, and we already had got this feedback from the field. This ADR is proposing to use the In model serving we just need a mechanism to configure the resources. Whether this ADR is refined, or dropped in favor of whitelisting fields, or something else... that's more up to you :-) |
I will update the ADR to reflect the discussion to provide list of fields to whitelist across all components. See PR for Kserve specific changes. |
46bb171
to
66a8f97
Compare
I have updated the adr to reflect implementation to whitelist fields across components |
@VaishnaviHire is this ready for merging? |
Hi Yes, Just need lgtm from component owners |
Description
This ADR is a subset of feature requests defined in #23 .
How Has This Been Tested?
Merge criteria: