[workload] Get default resource values from LimitRanges #600

trasc · 2023-03-01T13:44:32Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add the possibility to use default workload resource values based on LimitRanges if present in the target namespace.

The order by which Requested resource values are set becomes:

Job spec
LimitRanges in the target namespace
Limits set for the same container in the Job Spec

In addition, this provides a more generic alternative approach to fix #590, by moving the limits to request default in the workload controller without an impact to the workload's spec .

Which issue(s) this PR fixes:

Fixes #541, #590

Special notes for your reviewer:

netlify · 2023-03-01T13:44:37Z

✅ Deploy Preview for kubernetes-sigs-kueue ready!

Name	Link
🔨 Latest commit	`453f4f2`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6414d603257dd80008059d63
😎 Deploy Preview	https://deploy-preview-600--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

k8s-ci-robot · 2023-03-01T13:44:40Z

Welcome @trasc!

It looks like this is your first PR to kubernetes-sigs/kueue 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kueue has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2023-03-01T13:44:42Z

Hi @trasc. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kerthcet · 2023-03-01T14:33:57Z

/ok-to-test

alculquicondor · 2023-03-01T14:38:27Z

@kerthcet since this is related to #590, can you take a look?

kerthcet · 2023-03-01T14:40:43Z

@kerthcet since this is related to #590, can you take a look?

Sure, but this will defer to tomorrow.
/assign

kerthcet

Just take an overview of this PR, I think we have something to come to an agreement at first:

About the defaulting orders:
- Respect container resources first
- Respect LimitRange, we should not defaulting the requests to limits as we want before applying the policy, it's a hard constraint required by the administrators.
- Defaulting the requests to limits if not set
About the validation
- We should also reject the creation if violating the LimitRange, in validation I think.
- If no request set in final, reject the creation also.

cc @alculquicondor for advices also.

apis/kueue/webhooks/workload_webhook.go

kerthcet · 2023-03-02T10:23:04Z

apis/kueue/webhooks/workload_webhook.go

+	ret := make([]corev1.LimitRangeItem, 0, len(list.Items))
+
+	for _, lr := range list.Items {
+		for _, lri := range lr.Spec.Limits {


I think we should only consider defaults here, if no defaults, then we'll set defaults to limits.

I'm afraid I don't understand.

We need to consider all the fields. min and max would also affect the containers requests.

Yes, what @alculquicondor said is what I mean. Sorry for the confusion.
In my option, the approach should be consider the default and defaultRequest in webhook defaulting, consider the min and max in webhook validating.

When comparing jobAndWorkloadEqual:

if job is suspend, we'll query the limitRange again

if job is unsuspend, we can ignore the resources in comparation because pod will not be affected by LimitRange after creation.

Indeed we could also validate the boundaries, however I propose to do this in another Issue/PR otherwise it will get to hard to track. Also we should decide what should happen if the validation fails, as of now, the job controller will just endlessly to to create the workload and fail.

In my opinion, tracking the defaults used in order to recreate the workload add too much coupling between the job-controllers and the workloads, with very little added value (in practice I don't expect the LimitRanges to change very often).

pkg/controller/workload/job/job_controller.go

trasc · 2023-03-02T12:53:47Z

Just take an overview of this PR, I think we have something to come to an agreement at first:

About the defaulting orders:

Respect container resources first

Respect LimitRange, we should not defaulting the requests to limits as we want before applying the policy, it's a hard constraint required by the administrators.

Defaulting the requests to limits if not set

About the validation

We should also reject the creation if violating the LimitRange, in validation I think.

If no request set in final, reject the creation also.

cc @alculquicondor for advices also.

For the defaulting order, in my opinion, the value set in the limits of the job has better chances to be closer to the actual need of the job that a value set for the entire namespace. The same happens for pods.

Regarding the validation, sure, it could be added.

apis/kueue/webhooks/workload_webhook.go

alculquicondor · 2023-03-02T17:08:53Z

apis/kueue/webhooks/workload_webhook.go

+	ret := make([]corev1.LimitRangeItem, 0, len(list.Items))
+
+	for _, lr := range list.Items {
+		for _, lri := range lr.Spec.Limits {


We need to consider all the fields. min and max would also affect the containers requests.

apis/kueue/webhooks/workload_webhook.go

apis/kueue/webhooks/workload_webhook_test.go

pkg/controller/workload/job/job_controller.go

kerthcet · 2023-03-03T04:30:49Z

For the defaulting order, in my opinion, the value set in the limits of the job has better chances to be closer to the actual need of the job that a value set for the entire namespace. The same happens for pods.

Defaulting Limits to Requests is not transparent to users, if we have LimitRange with defaultRequest set, workload can be constructed successfully, why we get involved with the limits -> requests? We used to do this just because workloads requires the requests.

apis/kueue/webhooks/workload_webhook.go

apis/kueue/webhooks/workload_webhook_test.go

pkg/controller/workload/job/job_controller.go

alculquicondor

With this PR, if a LimitRange changes, then the usage of admitted Workloads would change.

I think this is fine as an intermediate step given that we do have the same problem with RuntimeClass.

pkg/util/limitrange/limitrange.go

pkg/controller/core/limitrange_controller.go

pkg/util/resource/resource.go

pkg/controller/core/workload_controller.go

pkg/util/resource/resource_test.go

pkg/util/testing/wrappers.go

test/integration/controller/core/workload_controller_test.go

Use the LimitRange within the same namespace to adjust the podSet's resource requirements before adding the to cache and queues.

alculquicondor

Comments that can be done in follow ups

/lgtm
/approve

/label tide/merge-method-squash

alculquicondor · 2023-03-17T21:09:22Z

pkg/controller/core/indexer/indexer.go

@@ -58,6 +60,20 @@ func IndexWorkloadClusterQueue(obj client.Object) []string {
 	return []string{string(wl.Status.Admission.ClusterQueue)}
 }

+func IndexLimitRangeHasContainerType(obj client.Object) []string {


We might need to also check type Pod when we do validation (for min/max).

So an alternative implementation would be IndexLimitRangeByType. Then you can return all the types that the LimitRange has.

But we can leave this for later.

alculquicondor · 2023-03-17T21:10:47Z

pkg/util/resource/resource_test.go

+
+	"github.com/google/go-cmp/cmp"
+	corev1 "k8s.io/api/core/v1"
+	apiresource "k8s.io/apimachinery/pkg/api/resource"


we avoid aliases when there are no conflicts.

alculquicondor · 2023-03-17T21:11:13Z

pkg/util/testing/wrappers.go

+						Max:                  corev1.ResourceList{},
+						Min:                  corev1.ResourceList{},
+						Default:              corev1.ResourceList{},
+						DefaultRequest:       corev1.ResourceList{},
+						MaxLimitRequestRatio: corev1.ResourceList{},


you could remove these lines.

yes, but then you need to check / alocate the empty map for every value set

ahhh, sorry, I forgot these were maps :)

k8s-ci-robot · 2023-03-17T21:11:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, trasc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alculquicondor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 1, 2023

k8s-ci-robot requested review from ahg-g and ArangoGutierrez March 1, 2023 13:44

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 1, 2023

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 1, 2023

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 1, 2023

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 1, 2023

k8s-ci-robot assigned kerthcet Mar 1, 2023

trasc force-pushed the workload_default_from_namespace_limits branch from 392c99d to b37ee3c Compare March 1, 2023 14:45

kerthcet reviewed Mar 2, 2023

View reviewed changes

trasc changed the title ~~Workload default from namespace limits~~ [workload] Get default resource values from LimitRanges Mar 2, 2023

trasc force-pushed the workload_default_from_namespace_limits branch from b37ee3c to 7a97422 Compare March 2, 2023 13:57

alculquicondor reviewed Mar 2, 2023

View reviewed changes

trasc force-pushed the workload_default_from_namespace_limits branch from 7a97422 to a8d7f70 Compare March 3, 2023 07:13

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 3, 2023

trasc force-pushed the workload_default_from_namespace_limits branch from a8d7f70 to 595a5a1 Compare March 3, 2023 07:30

alculquicondor reviewed Mar 3, 2023

View reviewed changes

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 3, 2023

trasc marked this pull request as draft March 3, 2023 15:51

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 16, 2023

trasc force-pushed the workload_default_from_namespace_limits branch from 19cb368 to 4758735 Compare March 16, 2023 09:13

trasc requested review from alculquicondor and kerthcet and removed request for alculquicondor and kerthcet March 16, 2023 09:21

alculquicondor reviewed Mar 16, 2023

View reviewed changes

pkg/util/limitrange/limitrange.go Outdated Show resolved Hide resolved

pkg/controller/core/limitrange_controller.go Outdated Show resolved Hide resolved

alculquicondor reviewed Mar 16, 2023

View reviewed changes

pkg/util/resource/resource.go Outdated Show resolved Hide resolved

trasc force-pushed the workload_default_from_namespace_limits branch 2 times, most recently from 67709c3 to afa1a94 Compare March 17, 2023 16:17

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 17, 2023

alculquicondor reviewed Mar 17, 2023

View reviewed changes

trasc added 5 commits March 17, 2023 23:04

[pkg/util] Add resource

1b606a3

[pkg/util] Add limitrange

758b230

[controller/workload] Use LimitRange to adjust workload resources

27d0aad

Use the LimitRange within the same namespace to adjust the podSet's resource requirements before adding the to cache and queues.

[workload] Move limit to range resource copying to controller

09182f7

[workload][indexer] Add spec.hasContainerType index for limitRange

453f4f2

trasc force-pushed the workload_default_from_namespace_limits branch from afa1a94 to 453f4f2 Compare March 17, 2023 21:05

alculquicondor reviewed Mar 17, 2023

View reviewed changes

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Mar 17, 2023

k8s-ci-robot assigned alculquicondor Mar 17, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2023

k8s-ci-robot merged commit 619b06c into kubernetes-sigs:main Mar 17, 2023

trasc mentioned this pull request Apr 19, 2023

REQUEST: New membership for trasc kubernetes/org#4168

Closed

9 tasks

trasc deleted the workload_default_from_namespace_limits branch April 20, 2023 06:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[workload] Get default resource values from LimitRanges #600

[workload] Get default resource values from LimitRanges #600

trasc commented Mar 1, 2023 •

edited

Loading

netlify bot commented Mar 1, 2023 •

edited

Loading

k8s-ci-robot commented Mar 1, 2023

k8s-ci-robot commented Mar 1, 2023

kerthcet commented Mar 1, 2023

alculquicondor commented Mar 1, 2023

kerthcet commented Mar 1, 2023

kerthcet left a comment

kerthcet Mar 2, 2023

trasc Mar 2, 2023

alculquicondor Mar 2, 2023

kerthcet Mar 3, 2023

trasc Mar 6, 2023

trasc commented Mar 2, 2023 •

edited

Loading

alculquicondor Mar 2, 2023

kerthcet commented Mar 3, 2023

alculquicondor left a comment

alculquicondor left a comment

alculquicondor Mar 17, 2023

alculquicondor Mar 17, 2023

alculquicondor Mar 17, 2023

trasc Mar 17, 2023

alculquicondor Mar 17, 2023

k8s-ci-robot commented Mar 17, 2023

[workload] Get default resource values from LimitRanges #600

[workload] Get default resource values from LimitRanges #600

Conversation

trasc commented Mar 1, 2023 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

netlify bot commented Mar 1, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-kueue ready!

k8s-ci-robot commented Mar 1, 2023

k8s-ci-robot commented Mar 1, 2023

kerthcet commented Mar 1, 2023

alculquicondor commented Mar 1, 2023

kerthcet commented Mar 1, 2023

kerthcet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trasc commented Mar 2, 2023 • edited Loading

Choose a reason for hiding this comment

kerthcet commented Mar 3, 2023

alculquicondor left a comment

Choose a reason for hiding this comment

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 17, 2023

trasc commented Mar 1, 2023 •

edited

Loading

netlify bot commented Mar 1, 2023 •

edited

Loading

trasc commented Mar 2, 2023 •

edited

Loading