Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[workload] Get default resource values from LimitRanges #600

Merged

Conversation

trasc
Copy link
Contributor

@trasc trasc commented Mar 1, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add the possibility to use default workload resource values based on LimitRanges if present in the target namespace.

The order by which Requested resource values are set becomes:

  1. Job spec
  2. LimitRanges in the target namespace
  3. Limits set for the same container in the Job Spec

In addition, this provides a more generic alternative approach to fix #590, by moving the limits to request default in the workload controller without an impact to the workload's spec .

Which issue(s) this PR fixes:

Fixes #541, #590

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 1, 2023
@netlify
Copy link

netlify bot commented Mar 1, 2023

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 453f4f2
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6414d603257dd80008059d63
😎 Deploy Preview https://deploy-preview-600--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 1, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @trasc!

It looks like this is your first PR to kubernetes-sigs/kueue 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kueue has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 1, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @trasc. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 1, 2023
@kerthcet
Copy link
Contributor

kerthcet commented Mar 1, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 1, 2023
@alculquicondor
Copy link
Contributor

@kerthcet since this is related to #590, can you take a look?

@kerthcet
Copy link
Contributor

kerthcet commented Mar 1, 2023

@kerthcet since this is related to #590, can you take a look?

Sure, but this will defer to tomorrow.
/assign

@trasc trasc force-pushed the workload_default_from_namespace_limits branch from 392c99d to b37ee3c Compare March 1, 2023 14:45
Copy link
Contributor

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just take an overview of this PR, I think we have something to come to an agreement at first:

  • About the defaulting orders:
    • Respect container resources first
    • Respect LimitRange, we should not defaulting the requests to limits as we want before applying the policy, it's a hard constraint required by the administrators.
    • Defaulting the requests to limits if not set
  • About the validation
    • We should also reject the creation if violating the LimitRange, in validation I think.
    • If no request set in final, reject the creation also.

cc @alculquicondor for advices also.

apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
ret := make([]corev1.LimitRangeItem, 0, len(list.Items))

for _, lr := range list.Items {
for _, lri := range lr.Spec.Limits {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only consider defaults here, if no defaults, then we'll set defaults to limits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider all the fields. min and max would also affect the containers requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what @alculquicondor said is what I mean. Sorry for the confusion.
In my option, the approach should be consider the default and defaultRequest in webhook defaulting, consider the min and max in webhook validating.

When comparing jobAndWorkloadEqual:

  • if job is suspend, we'll query the limitRange again
  • if job is unsuspend, we can ignore the resources in comparation because pod will not be affected by LimitRange after creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we could also validate the boundaries, however I propose to do this in another Issue/PR otherwise it will get to hard to track. Also we should decide what should happen if the validation fails, as of now, the job controller will just endlessly to to create the workload and fail.

In my opinion, tracking the defaults used in order to recreate the workload add too much coupling between the job-controllers and the workloads, with very little added value (in practice I don't expect the LimitRanges to change very often).

pkg/controller/workload/job/job_controller.go Outdated Show resolved Hide resolved
@trasc
Copy link
Contributor Author

trasc commented Mar 2, 2023

Just take an overview of this PR, I think we have something to come to an agreement at first:

  • About the defaulting orders:

    • Respect container resources first
    • Respect LimitRange, we should not defaulting the requests to limits as we want before applying the policy, it's a hard constraint required by the administrators.
    • Defaulting the requests to limits if not set
  • About the validation

    • We should also reject the creation if violating the LimitRange, in validation I think.
    • If no request set in final, reject the creation also.

cc @alculquicondor for advices also.

For the defaulting order, in my opinion, the value set in the limits of the job has better chances to be closer to the actual need of the job that a value set for the entire namespace. The same happens for pods.

Regarding the validation, sure, it could be added.

@trasc trasc changed the title Workload default from namespace limits [workload] Get default resource values from LimitRanges Mar 2, 2023
@trasc trasc force-pushed the workload_default_from_namespace_limits branch from b37ee3c to 7a97422 Compare March 2, 2023 13:57
apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
ret := make([]corev1.LimitRangeItem, 0, len(list.Items))

for _, lr := range list.Items {
for _, lri := range lr.Spec.Limits {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider all the fields. min and max would also affect the containers requests.

apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
apis/kueue/webhooks/workload_webhook_test.go Outdated Show resolved Hide resolved
pkg/controller/workload/job/job_controller.go Outdated Show resolved Hide resolved
@kerthcet
Copy link
Contributor

kerthcet commented Mar 3, 2023

For the defaulting order, in my opinion, the value set in the limits of the job has better chances to be closer to the actual need of the job that a value set for the entire namespace. The same happens for pods.

Defaulting Limits to Requests is not transparent to users, if we have LimitRange with defaultRequest set, workload can be constructed successfully, why we get involved with the limits -> requests? We used to do this just because workloads requires the requests.

@trasc trasc force-pushed the workload_default_from_namespace_limits branch from 7a97422 to a8d7f70 Compare March 3, 2023 07:13
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 3, 2023
@trasc trasc force-pushed the workload_default_from_namespace_limits branch from a8d7f70 to 595a5a1 Compare March 3, 2023 07:30
apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
apis/kueue/webhooks/workload_webhook.go Outdated Show resolved Hide resolved
apis/kueue/webhooks/workload_webhook_test.go Outdated Show resolved Hide resolved
pkg/controller/workload/job/job_controller.go Outdated Show resolved Hide resolved
pkg/controller/workload/job/job_controller.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 3, 2023
@trasc trasc marked this pull request as draft March 3, 2023 15:51
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 16, 2023
@trasc trasc force-pushed the workload_default_from_namespace_limits branch from 19cb368 to 4758735 Compare March 16, 2023 09:13
@trasc trasc requested review from alculquicondor and kerthcet and removed request for alculquicondor and kerthcet March 16, 2023 09:21
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, if a LimitRange changes, then the usage of admitted Workloads would change.

I think this is fine as an intermediate step given that we do have the same problem with RuntimeClass.

pkg/util/limitrange/limitrange.go Outdated Show resolved Hide resolved
pkg/controller/core/limitrange_controller.go Outdated Show resolved Hide resolved
@trasc trasc force-pushed the workload_default_from_namespace_limits branch 2 times, most recently from 67709c3 to afa1a94 Compare March 17, 2023 16:17
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 17, 2023
pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved
pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved
pkg/util/resource/resource_test.go Outdated Show resolved Hide resolved
pkg/util/resource/resource_test.go Outdated Show resolved Hide resolved
pkg/util/resource/resource_test.go Outdated Show resolved Hide resolved
pkg/util/testing/wrappers.go Outdated Show resolved Hide resolved
pkg/util/testing/wrappers.go Outdated Show resolved Hide resolved
pkg/util/testing/wrappers.go Outdated Show resolved Hide resolved
@trasc trasc force-pushed the workload_default_from_namespace_limits branch from afa1a94 to 453f4f2 Compare March 17, 2023 21:05
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments that can be done in follow ups

/lgtm
/approve

/label tide/merge-method-squash

@@ -58,6 +60,20 @@ func IndexWorkloadClusterQueue(obj client.Object) []string {
return []string{string(wl.Status.Admission.ClusterQueue)}
}

func IndexLimitRangeHasContainerType(obj client.Object) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to also check type Pod when we do validation (for min/max).

So an alternative implementation would be IndexLimitRangeByType. Then you can return all the types that the LimitRange has.

But we can leave this for later.


"github.com/google/go-cmp/cmp"
corev1 "k8s.io/api/core/v1"
apiresource "k8s.io/apimachinery/pkg/api/resource"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we avoid aliases when there are no conflicts.

Comment on lines +528 to +532
Max: corev1.ResourceList{},
Min: corev1.ResourceList{},
Default: corev1.ResourceList{},
DefaultRequest: corev1.ResourceList{},
MaxLimitRequestRatio: corev1.ResourceList{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could remove these lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but then you need to check / alocate the empty map for every value set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, sorry, I forgot these were maps :)

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Mar 17, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, trasc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2023
@k8s-ci-robot k8s-ci-robot merged commit 619b06c into kubernetes-sigs:main Mar 17, 2023
@trasc trasc deleted the workload_default_from_namespace_limits branch April 20, 2023 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
4 participants