-
Notifications
You must be signed in to change notification settings - Fork 143
[discussion] Refactor pytorch operator APIs #84
Comments
How about placing all APIs and clients in one repository named kuebflow/clients or clientsets |
Pytorch-operator currently imports a lot of reusable code from tf-operator. The idea is to make tf-operator the "canonical" operator, from which other training components can extend. |
@richardsliu Thanks for taking this up. I was thinking in the same lines while I was restructuring the operator code. We planned to keep it this way so that CRDs of each operator are completely independent of each other(which gives more flexibility to each operator) This came up in one of the discussions in kubeflow-discuss group. For eg: CleanPodPolicyRunning policy is supported in TF but not in PyTorch. However, it is a just a design choice if we want to share the status of the Job(JobStatus) across all operators. We will then have a consistent status field for every operator. |
I think we do not have Personally, I think sharing could help us to keep the consistency, which is helpful for users. |
Implemented TF: kubeflow/training-operator#859 Pytorch: #93 |
Closing this issue |
@johnugeorge: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Most of the types defined in https://github.com/kubeflow/pytorch-operator/blob/master/pkg/apis/pytorch/v1alpha2/types.go overlaps with TFJob. The structures of the APIs in tf-operator and pytorch-operator are similar enough such that they should just extend from a single API.
I propose something like:
Common:
TFJob:
Pytorch:
The common types can reside in the tf-operator repository for now. This will allow us to:
Thoughts?
The text was updated successfully, but these errors were encountered: