-
Notifications
You must be signed in to change notification settings - Fork 71
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -6,19 +6,19 @@ To use the reconciler, following methods must be overridden according to the API | |||
|
|||
```go | |||
// GetJob returns the job that matches the request | |||
func (r *KubeflowJobReconciler) GetJob(ctx context.Context, req ctrl.Request) (client.Object, error) | |||
func (r *JobReconciler) GetJob(ctx context.Context, req ctrl.Request) (client.Object, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like Training or TrainingJob since we are always dealing with model trainings here. Job seems very generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TrainingJob
sounds good to me. As the common library is designed for all kinds of controllers, I would suggest renaming the GenericJob
to TrainingJob
in the pull request I mentioned above for tf-operator (kubeflow/training-operator#1398). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
/cc @kubeflow/wg-training-leads |
Sorry for the delay. Let me merge #163 first before accepting any new changes. This is really annoying. |
@Jeffwan could we move this pr forward as the revert string issue is resolved now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/assign @Jeffwan
|
||
// GetReconcilerName returns the name of this reconciler, which is "Kubeflow Reconciler" | ||
// GetReconcilerName returns the name of this reconciler, which is "common-reconciler" | ||
func (r *ReconcilerUtil) GetReconcilerName() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Kubeflow Reconciler" won't work because it's not a word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. When generating the default labels, the returned string from GetReconcilerName
will be used as the value for key commonv1.OperatorNameLable
(""training.kubeflow.org/operator-name""
).
@@ -239,21 +239,3 @@ type JobInterface interface { | |||
// PastActiveDeadline CAN be overridden to customize how to determine if this job has past activate deadline. | |||
PastActiveDeadline(runPolicy *commonv1.RunPolicy, jobStatus *commonv1.JobStatus) bool | |||
} | |||
|
|||
// KubeflowReconcilerInterface defines the abstract interface for a base reconciler for kubeflow jobs. | |||
type KubeflowReconcilerInterface interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation to move KubeflowReconcilerInterface
to training-operator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing the test code, test_reconciler
, which could be considered as one example of how the common.reconciler
will be used, I found that:
- Such interface will not be used for any abstraction when implementing a real reconciler
- Embedding modularized interfaces in
KubeflowReconcilerInterface
conflicts embedding them in the real reconciler struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it make sense to me. current testing integration between common and training-operator is a headache.
svcList := &corev1.ServiceList{} | ||
err := r.List(ctx, svcList, client.MatchingLabels(r.GenLabels(job.GetName()))) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
var svcs []*corev1.Service = nil | ||
for _, svc := range svcList.Items { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em. what's the problem of original way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. yeah. This pointer points to the shared memory location that each of the slice items are being copied into.
All the pointers created using &svc
point to the same memory location.
Current way looks good to me.
@zw0610 sorry for late response. Please have a check on above comments |
remove nil
/lgtm This change looks good to me. Let's see if anyone else have further feedbacks. /cc @kubeflow/wg-training-leads |
Any feedback before we can merge this pr? @kubeflow/wg-training-leads |
/hold cancel |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jeffwan, terrytangyuan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
To support
GenericJob
in tf-operator, a kind of Job independent from any deep learning frameworks, this pull request introduces the following modification toreconciler.v1
package:Kubeflow
)PodList
,ServiceList
converting issueGetReconcilerName
returned valuereconciler.v1
package by moving theKubeflowReconciler
to tf-operator [WIP]: add a GenericJob type and controller training-operator#1398TestReconciler
construction function