-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod groups: e2e tests for diverse pods and preemption #1638
Pod groups: e2e tests for diverse pods and preemption #1638
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
Ok, I see why a re-admitted workload is deleted after completion, I find it inconsistent, because it does not get if finished in the first run. Here is the scenario:
So, this feels inconsistent, because the workload would stay if succeeded in the first run. It took me a while to understand the interactions, but maybe it is acceptable, because the workload is finished anyway, and workloads aren't really user-facing APIs. WDYT @alculquicondor @tenzen-y ? I guess in the e2e I can just assume it is deleted. |
As discussed in #1557, this is a bug. |
4722065
to
a1976ad
Compare
Thank you for the clarifications! As Aldo mentioned in #1557. I also think this is a bug. |
/retest |
a1976ad
to
0899f55
Compare
/assign @tenzen-y @alculquicondor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/hold
for test retries.
// For replacement pods use args that let it complete fast. | ||
rep.Name = "replacement-for-" + rep.Name | ||
rep.Spec.Containers[0].Args = []string{"1ms"} | ||
gomega.Expect(k8sClient.Create(ctx, rep)).To(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is potential for flakiness here, as Kueue might not have observed the Pod as Failed yet.
I'm not sure if events within a kind are ordered. In that case, Kueue might only see the replacement Pod after it has seen the other Pod as failed, in which case there wouldn't be flakiness.
Let's run the tests a few times.
Otherwise, we might have to implement the logic in which, instead of deleting excess Pods, they are left gated until there is space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there is no issue with flakiness, because I run this in a look locally and for >1h and all attempts have passed. Also, all attempts on GH CI passed (6).
/test pull-kueue-test-e2e-main-1-27 |
/test pull-kueue-test-e2e-main-1-27 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/lgtm
/approve
/hold cancel
LGTM label has been added. Git tree hash: b4f0ccecf7923d07707dca0762e4a55813a2d4f6
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, mimowo, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
…s#1638) * WIP: Add more e2e tests for pod groups * cleanup * review comment
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
To verify the feature works as expected and prevent regressions.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
In order to verify the tests aren't flaky I let them run for 1h (until timeout) in a loop, and no errors.
Does this PR introduce a user-facing change?