-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support MPIJob managedBy feature for the MultiKueue #3289
Support MPIJob managedBy feature for the MultiKueue #3289
Conversation
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/ok-to-test |
/retest |
5d873dd
to
7fa0893
Compare
/retest |
7fa0893
to
89cc222
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also implement defaulting webhoooks?
Good question, I guess it's true... |
Yes, we want the webhook defaulting too. |
As I mentioned in the issue, let's implement the dedicated webhook instead of base:
|
89cc222
to
8615a3f
Compare
/retest |
8615a3f
to
d87e5e4
Compare
/retest |
541f2f1
to
eb0238c
Compare
/retest |
/lgtm Thanks! |
LGTM label has been added. Git tree hash: e940cc9c7145f77bc65e779f0ebfcfff5874dc32
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mszadkow Thank you for driving this feature across kubeflow and Kueue!
I left a few nits comments.
@@ -116,16 +116,53 @@ func TestMultikueueAdapter(t *testing.T) { | |||
return adapter.DeleteRemoteObject(ctx, workerClient, types.NamespacedName{Name: "mpijob1", Namespace: TestNamespace}) | |||
}, | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the same case as
"missing jobset is not considered managed": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
if canDefaultManagedBy(mpiJob.Spec.RunPolicy.ManagedBy) { | ||
localQueueName, found := mpiJob.Labels[constants.QueueLabel] | ||
if !found { | ||
return nil | ||
} | ||
clusterQueueName, ok := w.queues.ClusterQueueFromLocalQueue(queue.QueueKey(mpiJob.ObjectMeta.Namespace, localQueueName)) | ||
if !ok { | ||
log.V(5).Info("Cluster queue for local queue not found", "mpijob", klog.KObj(mpiJob), "localQueue", localQueueName) | ||
return nil | ||
} | ||
for _, admissionCheck := range w.cache.AdmissionChecksForClusterQueue(clusterQueueName) { | ||
if admissionCheck.Controller == kueue.MultiKueueControllerName { | ||
log.V(5).Info("Defaulting ManagedBy", "mpijob", klog.KObj(mpiJob), "oldManagedBy", mpiJob.Spec.RunPolicy.ManagedBy, "managedBy", kueue.MultiKueueControllerName) | ||
mpiJob.Spec.RunPolicy.ManagedBy = ptr.To(kueue.MultiKueueControllerName) | ||
return nil | ||
} | ||
} | ||
} | ||
|
||
return nil | ||
} | ||
|
||
func canDefaultManagedBy(mpiJobSpecManagedBy *string) bool { | ||
return features.Enabled(features.MultiKueue) && | ||
(mpiJobSpecManagedBy == nil || *mpiJobSpecManagedBy == v2beta1.KubeflowJobController) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to commonize in the jobframework package, but we can consider it as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow up, fair deal ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@mszadkow Could you open PR to update the documentation?
/lgtm
/approve
LGTM label has been added. Git tree hash: aedfc0e34b32347d5a300404ed345d672c2ee0d5
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mszadkow, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you! |
#3316 - docs PR |
…3289) * Add managedBy field impl and unit tests * Update MpiJob multikueue integration test * Update e2e tests and start to use mpi-operator on managment cluster * Implement webhook defaulting * Update after code review
…3289) * Add managedBy field impl and unit tests * Update MpiJob multikueue integration test * Update e2e tests and start to use mpi-operator on managment cluster * Implement webhook defaulting * Update after code review
What type of PR is this?
/kind feature
What this PR does / why we need it:
Support MPIJob managedBy feature for the MultiKueue.
Allow for installing
mpi-operator
in management cluster instead of only crds.Which issue(s) this PR fixes:
Fixes #3257
Special notes for your reviewer:
Does this PR introduce a user-facing change?