-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Add manifests for Kubeflow Training V2 #2289
KEP-2170: Add manifests for Kubeflow Training V2 #2289
Conversation
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Pull Request Test Coverage Report for Build 11413026138Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding manifests!
//+kubebuilder:rbac:groups=jobset.x-k8s.io,resources=jobsets,verbs=get;list;watch;create | ||
//+kubebuilder:rbac:groups=scheduling.x-k8s.io,resources=podgroups,verbs=get;list;watch;create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move these markers to the each job pipeline frame work plugins?
https://github.com/kubeflow/training-operator/tree/master/pkg/runtime.v2/framework/plugins
//+kubebuilder:rbac:groups=kubeflow.org,resources=trainingruntimes,verbs=get;list;watch | ||
//+kubebuilder:rbac:groups=kubeflow.org,resources=clustertrainingruntimes,verbs=get;list;watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move these markers to each job pipeline framework series?
https://github.com/kubeflow/training-operator/tree/master/pkg/runtime.v2/core
@@ -52,6 +52,8 @@ import ( | |||
const ( | |||
// EnvKubeflowNamespace is an environment variable for namespace when deployed on kubernetes | |||
EnvKubeflowNamespace = "KUBEFLOW_NAMESPACE" | |||
|
|||
webhookConfigurationName = "validator.training-operator.kubeflow.org" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we need to refine the webhook controller-gen marker as well:
// +kubebuilder:webhook:path=/validate-kubeflow-org-v1-pytorchjob,mutating=false,failurePolicy=fail,sideEffects=None,groups=kubeflow.org,resources=pytorchjobs,verbs=create;update,versions=v1,name=validator.pytorchjob.training-operator.kubeflow.org,admissionReviewVersions=v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should change it, since those parameters defines the particular webhook configuration, not the name of validation webhook configuration object. For example:
name: validator.trainingruntime.kubeflow.org |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right.
This will be used in
- op: replace
path: /metadata/name
value: validator.training-operator-v2.kubeflow.org
@@ -44,6 +44,10 @@ import ( | |||
webhookv2 "github.com/kubeflow/training-operator/pkg/webhook.v2" | |||
) | |||
|
|||
const ( | |||
webhookConfigurationName = "validator.training-operator-v2.kubeflow.org" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto:
// +kubebuilder:webhook:path=/validate-kubeflow-org-v2alpha1-trainjob,mutating=false,failurePolicy=fail,sideEffects=None,groups=kubeflow.org,resources=trainjobs,verbs=create;update,versions=v2alpha1,name=validator.trainjob.kubeflow.org,admissionReviewVersions=v1 |
metadata: | ||
name: training-operator-v2 | ||
labels: | ||
training.kubeflow.org/component: manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of these, could we use the kubernetes recommended labels?
https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
Signed-off-by: Andrey Velichkevich <[email protected]>
97663fe
to
cc72871
Compare
Signed-off-by: Andrey Velichkevich <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
* KEP-2170: Add manifests for Kubeflow Training V2 Signed-off-by: Andrey Velichkevich <[email protected]> * Fix invalid name for webhook config in cert Signed-off-by: Andrey Velichkevich <[email protected]> * Fix integration tests Signed-off-by: Andrey Velichkevich <[email protected]> * Move kubebuilder markers to runtime framework Signed-off-by: Andrey Velichkevich <[email protected]> * Use Kubernetes recommended labels Signed-off-by: Andrey Velichkevich <[email protected]> --------- Signed-off-by: Andrey Velichkevich <[email protected]>
* KEP-2170: Add manifests for Kubeflow Training V2 Signed-off-by: Andrey Velichkevich <[email protected]> * Fix invalid name for webhook config in cert Signed-off-by: Andrey Velichkevich <[email protected]> * Fix integration tests Signed-off-by: Andrey Velichkevich <[email protected]> * Move kubebuilder markers to runtime framework Signed-off-by: Andrey Velichkevich <[email protected]> * Use Kubernetes recommended labels Signed-off-by: Andrey Velichkevich <[email protected]> --------- Signed-off-by: Andrey Velichkevich <[email protected]> Signed-off-by: sailesh duddupudi <[email protected]>
Fixes: #2208
I added manifests for Training Operator V2. I renamed the manager image to:
That will allow users to use
latest
version for both V1 and V2 version of Training Operator.In the future, we can deprecate the old version of Training Operator.
For now, I install JobSet using the release manifests in the Kustomize overlay. Let's discuss with @kubeflow/wg-manifests-leads what is the better approach long-term.
Additionally, I fixed the invalid validating webhook configuration name in our cert generator.
/assign @kubeflow/wg-training-leads
/hold for review