-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaitForPodsReady: Reset the requeueState while reconciling #1838
WaitForPodsReady: Reset the requeueState while reconciling #1838
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/cherry-pick release-0.6 |
@tenzen-y: once the present PR merges, I will cherry-pick it on top of release-0.6 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ffba87c
to
975b5f2
Compare
@@ -248,7 +248,7 @@ var _ = ginkgo.Describe("SchedulerWithWaitForPodsReady", func() { | |||
// To avoid flakiness, we don't verify if the workload has a QuotaReserved=false with pending reason here. | |||
}) | |||
|
|||
ginkgo.It("Should re-admit a timed out workload and deactivate a workload exceeded the re-queue count limit", func() { | |||
ginkgo.It("Should re-admit a timed out workload and deactivate a workload exceeded the re-queue count limit. After that re-activating a workload", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/activating/activate
may we consider a separate test for this, assuming that setting up the state isn't too difficult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to divide this test into two tests, but we need to start another manager to configure the backoffLimitCount=1 or 0
like this:
requeuingBackoffLimitCount = ptr.To[int32](2) |
It could increase integration test time. So, to avoid increasing time, I implement this test like the current one.
}, util.Timeout, util.Interval).Should(gomega.Succeed(), "Reactivate inactive Workload") | ||
gomega.Eventually(func(g gomega.Gomega) { | ||
g.Expect(k8sClient.Get(ctx, client.ObjectKeyFromObject(prodWl), prodWl)).Should(gomega.Succeed()) | ||
g.Expect(prodWl.Status.RequeueState).Should(gomega.BeNil()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we verify at the start of this sequence that the Requeue state is in a certain configuration (non-nil, or something more specific), in case the previous part of test changes?
Not relevant if this test is split out and this state is set explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already verify that here:
util.ExpectWorkloadToHaveRequeueCount(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), ptr.To[int32](2)) |
Does it make sense?
apimeta.SetStatusCondition(&prodWl.Status.Conditions, metav1.Condition{ | ||
Type: kueue.WorkloadEvicted, | ||
Status: metav1.ConditionTrue, | ||
Reason: kueue.WorkloadEvictedByDeactivation, | ||
Message: "evicted by Test", | ||
}) | ||
g.Expect(k8sClient.Status().Update(ctx, prodWl)).Should(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't the controller add this already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the jobs controller adds this condition here:
workload.SetEvictedCondition(wl, kueue.WorkloadEvictedByDeactivation, "The workload is deactivated") |
kueue/test/integration/scheduler/podsready/suite_test.go
Lines 68 to 104 in 975b5f2
cfg := &config.Configuration{ | |
WaitForPodsReady: &config.WaitForPodsReady{ | |
Enable: true, | |
BlockAdmission: &blockAdmission, | |
Timeout: &metav1.Duration{Duration: value}, | |
RequeuingStrategy: &config.RequeuingStrategy{ | |
Timestamp: ptr.To(requeuingTimestamp), | |
BackoffLimitCount: requeuingBackoffLimitCount, | |
}, | |
}, | |
} | |
mgr.GetScheme().Default(cfg) | |
err := indexer.Setup(ctx, mgr.GetFieldIndexer()) | |
gomega.Expect(err).NotTo(gomega.HaveOccurred()) | |
cCache := cache.New(mgr.GetClient(), cache.WithPodsReadyTracking(cfg.WaitForPodsReady.Enable && cfg.WaitForPodsReady.BlockAdmission != nil && *cfg.WaitForPodsReady.BlockAdmission)) | |
queues := queue.NewManager( | |
mgr.GetClient(), cCache, | |
queue.WithPodsReadyRequeuingTimestamp(requeuingTimestamp), | |
) | |
failedCtrl, err := core.SetupControllers(mgr, queues, cCache, cfg) | |
gomega.Expect(err).ToNot(gomega.HaveOccurred(), "controller", failedCtrl) | |
failedWebhook, err := webhooks.Setup(mgr) | |
gomega.Expect(err).ToNot(gomega.HaveOccurred(), "webhook", failedWebhook) | |
err = workloadjob.SetupIndexes(ctx, mgr.GetFieldIndexer()) | |
gomega.Expect(err).NotTo(gomega.HaveOccurred()) | |
sched := scheduler.New( | |
queues, cCache, mgr.GetClient(), mgr.GetEventRecorderFor(constants.AdmissionName), | |
scheduler.WithPodsReadyRequeuingTimestamp(requeuingTimestamp), | |
) | |
err = sched.Start(ctx) | |
gomega.Expect(err).NotTo(gomega.HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just register the indexers:
kueue/test/integration/scheduler/podsready/suite_test.go
Lines 96 to 97 in 975b5f2
err = workloadjob.SetupIndexes(ctx, mgr.GetFieldIndexer()) | |
gomega.Expect(err).NotTo(gomega.HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting...
We should probably move that piece of code to the Workload controller in a follow up. It doesn't belong in the job reconciler, as it is completely independent from the job. Otherwise, if anyone wants to write a custom integration, they would need to implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense, but we need to remember why we put this here.
Let me open an issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO to remove this status update and link to #1841?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thank you for raising it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: 255b9d366fb6ad738bc6ce44f50e054bf1017507
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
975b5f2
to
105c016
Compare
105c016
to
70ac2d3
Compare
/lgtm |
LGTM label has been added. Git tree hash: 1d5ed08e2f7d9626e4d9f6cc1bcae6b51e5bad99
|
/hold |
70ac2d3
to
62bb01f
Compare
… webhook Signed-off-by: Yuki Iwai <[email protected]>
62bb01f
to
688f223
Compare
@alculquicondor Could you give |
/hold cancel |
/lgtm |
LGTM label has been added. Git tree hash: 0d8aaed7c14f0efb389c215977f5b98d47beb0fd
|
/test pull-kueue-test-integration-main |
@tenzen-y: #1838 failed to apply on top of branch "release-0.6":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'm preparing the cherry-pick PR. |
… webhook (kubernetes-sigs#1838) Signed-off-by: Yuki Iwai <[email protected]>
… webhook (kubernetes-sigs#1838) Signed-off-by: Yuki Iwai <[email protected]>
What type of PR is this?
/kind bug
What this PR does / why we need it:
As @alculquicondor mentioned #1821 (comment), mutating webhooks seems not to be possible to update
spec
andstatus
in a single webhook call.So, we must reset requeueState while reconciling instead of webhooks.
Which issue(s) this PR fixes:
Fixes #1821
Special notes for your reviewer:
Does this PR introduce a user-facing change?