-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the same flavors in different resources might lead to unschedulable pods #167
Comments
This is going to be very tricky to solve. I think the best solution is to validate that a label key is used by one resource only. |
The example in the description doesn't make sense in practice and we should make sure that users can't create such setups. |
One issue here is if the ResourceFlavor get updated. But I think we need to make ResourceFlavor immutable any way (to ensure that admitted workloads reference the exact same flavor version assumed by the scheduler), and also prevent deleting a ResourceFlavor if there is any ClusterQueue referencing it. Updating a ResourceFlavor should be done by creating a new object with a different name, then changing the ClusterQueues to reference the new ResourceFlavor, and then the old ResourceFlavor can be deleted. The idea is to make it super hard to place the system in a nonsensical state. |
That's not enough to prevent a flavor to be used for different resources. A potential solution is to include the resource name in the ResourceFlavor object and validate that ClusterQueues only reference the flavor for the corresponding resource.
Why not? Your nodes might have certain cpu/memory ratio for the flavor and you might want to set the limits accordingly. |
Preventing a flavor from being used by different resources doesn't address the issue reported here because different flavors may use the same label and so cause the same conflict. Enforcing a label key to be used by a single resource ensures that it never gets assigned different values because of conflicting assignment across resources.
That is fair, but to address this case I think it is better to have an explicit API where the user can express that there is a dependency between two resources, and in which case they should have the exact same set of flavors, and so in the scheduler we check on those dependent resources together at the same time. |
How do you plan to implement this? When do you validate? ResourceFlavor creation or ClusterQueue creation?
I like this. Something like memory:
flavorsFrom: cpu |
If we make ResourceFlavor immutable and disallow deleting it until no CQ references it, then we can validate on CQ creation/update. |
Currently, the appropriate flavor should match with CQ labels / node selector / affinity. Different resources can use different flavor, but these selected flavor are in line with the filtering criteria. So is it in line with expectations even if it cannot be scheduled?
I think we need to determine whether it is referenced or not during ResourceFlavor updates and deletions, and if so, disallow the operation. |
I can help with that. |
Can you please hold? I don't think we have agreed in the high level solution for this problem. |
OK, I will hold the PR until agreement. |
For this one, perhaps the ultimate solution is to have an explicit API to express dependencies; For the time being I think we can validate that a CQ shouldn't have more than one resource using a label key, for this to work we need to:
|
I'm worried this might cause usability problems. It might be hard for an administrator to identify which CQs are using a resource, so they can remove them. And then they have to wait for running workloads to finish. But even if that wasn't a concern, I'm not convinced that validating that a label key is not used across resources is a necessary step to fix the issue reported.
Probably, but how does that look like? I'm not sure my suggestion above is enough memory:
flavorsFrom: cpu because we still need to specify the quota for each flavor. I think what we need to express is that certain resources have the same flavors and they must be verified together. What if this is just enforced through validation? Two resources can either have completely different flavors or they must share all flavors. And we can just look at flavor names for this. Then, when the scheduler identifies that some resources are grouped due to having the same resource flavors, it iterates flavors->resources to verify if a workload fits, instead of the current resources->flavors. |
I don't think there are usability concerns; with an API like kueue where we have multiple CRDs with dependencies, I would lean towards being more strict to avoid placing the system in a non-sense state.
I am ok with an implicit API, perhaps order should also be verified to be the same as well just so it is easier to do the flavors->resources iteration. If you think that the scheduling loop can be reversed, then we can proceed with this solution. |
Although checking only the flavor names isn't enough because one could create flavors of different names but use the same label key... |
I'm willing to accept that as a user error that we can document. This is on the hands of the administrator, who should be a power users.
I will tinker with the code for a bit. |
We do have enough context and information to prevent it, and I don't think there is a use case where we would want to allow it. |
We do... but it requires a lot of dealing with finalizers, which might be risky. If anything, I would leave it as a follow up. |
Leaving it as a followup is fine, it may not require more work with finalizers than what we do now if we make flavors immutable (then the question is whether this is too restrictive); in any case, we can proceed without it for now. |
/assign |
What happened:
When using the same flavors in multiple resources:
a workload could get admitted for cpu and memory with different flavors.
What you expected to happen:
workloads to get admitted to the same flavor for different resources.
The text was updated successfully, but these errors were encountered: