-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect snapshot usage when lendingLimit enabled #1770
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
52e0e46
to
770cba4
Compare
/cc |
@@ -1043,6 +1042,27 @@ func TestPreemption(t *testing.T) { | |||
wantPreempted: sets.New("/lend1-low", "/lend2-low"), | |||
enableLendingLimit: true, | |||
}, | |||
"preempt from one ClusterQueue in cohort-lend": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"preempt from one ClusterQueue in cohort-lend": { | |
"cannot preempt from other ClusterQueue if past lending limit": { |
Or another message that describes what the input and output is like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -43,18 +43,57 @@ func (s *Snapshot) RemoveWorkload(wl *workload.Info) { | |||
delete(cq.Workloads, workload.Key(wl.Obj)) | |||
updateUsage(wl, cq.Usage, -1) | |||
if cq.Cohort != nil { | |||
updateUsage(wl, cq.Cohort.Usage, -1) | |||
if features.Enabled(features.LendingLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have unit tests just for these functions. We should test for the transitions from "usage below guaranteed" and "usage above guaranteed" and the reverse, but also to the same states.
I'm not convinced the current implementation works.
However, maybe a simpler approach is:
- remove the existing "clusterqueue usage from the cohort"
updateUsage(wl, cq.Usage, +-1)
- Add the new "clusterqueue usage from cohort"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some tests that can prove the current implementation works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, maybe a simpler approach is:
+1, more readable to me. And I think the code is reuseable for Addworkload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/cache/snapshot.go
Outdated
flv, flvExist := cq.Cohort.Usage[wlResFlv] | ||
if flvExist && wlResExist { | ||
if _, exists := flv[wlRes]; exists { | ||
cohortUsage := cq.Usage[wlResFlv][wlRes] - cq.guaranteedQuota(wlResFlv, wlRes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cohortUsage := cq.Usage[wlResFlv][wlRes] - cq.guaranteedQuota(wlResFlv, wlRes) | |
nonGuaranteedUsage := cq.Usage[wlResFlv][wlRes] - cq.guaranteedQuota(wlResFlv, wlRes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
c79936a
to
56fcfb1
Compare
pkg/cache/snapshot.go
Outdated
for _, ps := range wl.TotalRequests { | ||
for wlRes, wlResFlv := range ps.Flavors { | ||
v, wlResExist := ps.Requests[wlRes] | ||
// update cq.Cohort.Usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this comment. Meaningless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -43,18 +43,57 @@ func (s *Snapshot) RemoveWorkload(wl *workload.Info) { | |||
delete(cq.Workloads, workload.Key(wl.Obj)) | |||
updateUsage(wl, cq.Usage, -1) | |||
if cq.Cohort != nil { | |||
updateUsage(wl, cq.Cohort.Usage, -1) | |||
if features.Enabled(features.LendingLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, maybe a simpler approach is:
+1, more readable to me. And I think the code is reuseable for Addworkload.
pkg/cache/snapshot_test.go
Outdated
}(), | ||
enableLendingLimit: true, | ||
}, | ||
"remove lend-a-2": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz make the test name more readable, like the intention you want to test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1043,6 +1042,27 @@ func TestPreemption(t *testing.T) { | |||
wantPreempted: sets.New("/lend1-low", "/lend2-low"), | |||
enableLendingLimit: true, | |||
}, | |||
"cannot preempt from other ClusterQueues if past lending limit": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cannot preempt from other ClusterQueues if past lending limit": { | |
"cannot preempt from other ClusterQueues if exceeds requestable qutota including lending limit": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General LGTM
pkg/cache/snapshot.go
Outdated
@@ -43,18 +43,26 @@ func (s *Snapshot) RemoveWorkload(wl *workload.Info) { | |||
delete(cq.Workloads, workload.Key(wl.Obj)) | |||
updateUsage(wl, cq.Usage, -1) | |||
if cq.Cohort != nil { | |||
updateUsage(wl, cq.Cohort.Usage, -1) | |||
if features.Enabled(features.LendingLimit) { | |||
updateCohortUsage(wl, cq, cq.Cohort.Usage, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pass cq as parameters only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/cache/snapshot_test.go
Outdated
remove []string | ||
add []string | ||
want Snapshot | ||
enableLendingLimit bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need of this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
322dcb2
to
65468ac
Compare
/assign @alculquicondor |
pkg/cache/snapshot_test.go
Outdated
add: []string{"/lend-a-1", "/lend-a-2", "/lend-a-3", "/lend-b-1"}, | ||
want: initialSnapshot, | ||
}, | ||
"remove all with lending limit": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"remove all with lending limit": { | |
"remove all": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/cache/snapshot_test.go
Outdated
workloads := []kueue.Workload{ | ||
*utiltesting.MakeWorkload("lend-a-1", ""). | ||
Request(corev1.ResourceCPU, "1"). | ||
ReserveQuota(utiltesting.MakeAdmission("lend-a").Assignment(corev1.ResourceCPU, "default", "1000m").Obj()). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReserveQuota(utiltesting.MakeAdmission("lend-a").Assignment(corev1.ResourceCPU, "default", "1000m").Obj()). | |
ReserveQuota(utiltesting.MakeAdmission("lend-a").Assignment(corev1.ResourceCPU, "default", "1").Obj()). |
For readability. Do the same for the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/cache/snapshot_test.go
Outdated
} | ||
cmpOpts := append(snapCmpOpts, | ||
cmpopts.IgnoreFields(ClusterQueue{}, "NamespaceSelector", "Preemption", "Status"), | ||
cmpopts.IgnoreFields(Cohort{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something looks off here. Not sure what this is ignoring. I think we should ignore ClusterQueues
within Cohort
to avoid recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ignoring the whole Cohort
struct. In this test function, all cqs belong to a same cohort, so there will be no difference. I think it's ok not to change this. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @alculquicondor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we actually want to check that the Usage
field matches what is set in tc.want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds resonable, modified.
@@ -1043,6 +1041,27 @@ func TestPreemption(t *testing.T) { | |||
wantPreempted: sets.New("/lend1-low", "/lend2-low"), | |||
enableLendingLimit: true, | |||
}, | |||
"cannot preempt from other ClusterQueues if exceeds requestable qutota including lending limit": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cannot preempt from other ClusterQueues if exceeds requestable qutota including lending limit": { | |
"cannot preempt from other ClusterQueues if exceeds requestable quota including lending limit": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thx. Done.
65468ac
to
502e289
Compare
*utiltesting.MakeWorkload("lend2-low", ""). | ||
Priority(-1). | ||
Request(corev1.ResourceCPU, "10"). | ||
ReserveQuota(utiltesting.MakeAdmission("lend2").Assignment(corev1.ResourceCPU, "default", "10000m").Obj()). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReserveQuota(utiltesting.MakeAdmission("lend2").Assignment(corev1.ResourceCPU, "default", "10000m").Obj()). | |
ReserveQuota(utiltesting.MakeAdmission("lend2").Assignment(corev1.ResourceCPU, "default", "10").Obj()). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Co-authored-by: kerthcet <[email protected]> Signed-off-by: B1F030 <[email protected]>
bb53767
to
f6369a1
Compare
/lgtm |
LGTM label has been added. Git tree hash: 83aaa484b027d52b380c0d645f2093f6ab5ea720
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, kerthcet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-0.6 |
@alculquicondor: once the present PR merges, I will cherry-pick it on top of release-0.6 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@alculquicondor: new pull request created: #1826 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
kubernetes-sigs#1770) Signed-off-by: B1F030 <[email protected]> Co-authored-by: B1F030 <[email protected]>
kubernetes-sigs#1770) Signed-off-by: B1F030 <[email protected]> Co-authored-by: B1F030 <[email protected]>
What type of PR is this?
/kind bug
What this PR does / why we need it:
cc @B1F030 I reproduced the bug with a preemption test, PTAL.
Which issue(s) this PR fixes:
Fixes #1385 (comment)
Special notes for your reviewer:
Does this PR introduce a user-facing change?