-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PodsReady condition if WaitForPodsReady is enabled #460
Add PodsReady condition if WaitForPodsReady is enabled #460
Conversation
Skipping CI for Draft Pull Request. |
acce86b
to
1f53a9a
Compare
2f78a66
to
195d055
Compare
f2f11c4
to
d09fce4
Compare
|
||
for name, tc := range testcases { | ||
t.Run(name, func(t *testing.T) { | ||
got := jobPodsReady(tc.job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have all the coverage in integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not everything, for example in integration tests we only test parallelism=completions. Further, the integration tests are much slower, so I would rather decrease the number of scenarios in integration tests than deleting unit tests.
1854f1d
to
c18da68
Compare
/lgtm |
/hold I would like to take a look |
@@ -45,6 +45,18 @@ type Configuration struct { | |||
|
|||
// InternalCertManagement is configuration for internalCertManagement | |||
InternalCertManagement *InternalCertManagement `json:"internalCertManagement,omitempty"` | |||
|
|||
// WaitForPodsReady is configuration for waitForPodsReady |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not useful docs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended
// handle a job when waitForPodsReady is enabled | ||
if r.waitForPodsReady { | ||
log.V(5).Info("Handling a job when waitForPodsReady is enabled") | ||
condition := generatePodsReadyCondition(&job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to update the condition when workload.admitted is set to nil? Or we want to wait until the job is actually suspended and we get the job update event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a practical scenario when it matters. Maybe if the request to set admission=nil, but the request to set the suspend=true fails (and is not executed as the job gets quickly re-admitted), but then it seems to make sense to react to the change of the suspend
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only scenario where it could matter is when the workload is immediately admitted again and we didn't have time to update the PodsReady condition to false.
But that could still happen in any case, because we can't update the status and spec at the same time. Unless we can in a mutating webhook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't reset the condition in a mutating webhook either. To be on the safer side, maybe we can set the condition to false as soon as we see admission=nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code now to check for workload admission to be on the safer side.
@@ -265,6 +268,217 @@ var _ = ginkgo.Describe("Job controller for workloads with no queue set", func() | |||
}) | |||
}) | |||
|
|||
var _ = ginkgo.Describe("Job controller when waitForPodsReady enabled", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test to verify that the condition gets updated for a workload that initially has spec.Admission
set and then unset (to simulate preemption).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the tests, but I reuse the previously added test cases (with suspended=true), but in these cases the job was never unsuspended. Now I unconditionally admit the workload at the beginning, but if a test specifies "suspended=true" then Admission is unset, the job is suspended, and then we verify the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, we can have a test in the scheduler once preemption is implemented.
f1f77b3
to
af1117f
Compare
bd4a29a
to
2b19996
Compare
543d096
to
843b00d
Compare
843b00d
to
b6f096e
Compare
pkg/workload/workload.go
Outdated
return InConditionWithStatus(w, condition, metav1.ConditionTrue) | ||
} | ||
|
||
func InConditionWithStatus(w *kueue.Workload, condition string, status metav1.ConditionStatus) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add this function.
As a follow up: remove InCondition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up unnecessary changes, will create the follow up PR to remove the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm Leaving the hold to @ahg-g |
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, alculquicondor, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This is the first part of implementing the solution designed in: #433.
Admission of queued workloads will be blocked until the currently starting workload has the
PodsReady
condition (whenWaitForPodsReady
is configured at the Kueue level).Which issue(s) this PR fixes:
Part of: #349
Special notes for your reviewer:
WaitForPodsReady
field is not fully implemented - there is going to be a follow upPR to implement blocking of admission