-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Katib generic for operator support #341
Comments
StudyJobController currently implements different logic for each worker type to check if the worker has completed: https://github.com/kubeflow/katib/blob/master/pkg/controller/studyjob/studyjob_controller.go#L413 BatchJob also supports the "last condition" format, so in theory we should be able to make this code generic: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/batch/v1/generated.proto#L158 |
/assign |
/area 0.5.0 |
I looked a bit into this, seems like we are dependent on other operator binaries in a few ways:
Instead, can we do something like:
Seems like these calls just need a pointer to a runtime.Object, and do not need any specific linking to operators.
We just need to define an interface with Status.Conditions, and it should just work across operators. The contract is that each resource needs to support Status.Conditions.
Do we really need this? Can we just let each operator handle this part? @YujiOshima @johnugeorge @gaocegege @hougangliu What do you think? |
I had a look at this earlier. Can you explain your solution for 4? Unstructured might work for 4. Not sure about it. I will raise a PR this week wrt to version upgrade in katib which is a pre requisite and then take this up. |
For 1, Do we need to watch each job type with unstructured?
then
and for Job, CronJob? For 4, I agree with @johnugeorge . |
For 4, we are trying to decode the yaml into a specific job instance, and raise an error if the decoding fails. Another place where we use this is https://github.com/kubeflow/katib/blob/master/pkg/controller/studyjob/utils.go#L83. What if katib skips this altogether and just use unstructured to create the job instance? Then just let the specific operator handle marshalling errors. Does that work? |
@richardsliu Thanks. Though I haven't checked, we may be able to use |
Agree. |
/close |
@johnugeorge: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently in order to support custom K8s to Katib, explicit support has to be added. There are explicit checks for each resource in the current code. Remove such checks so that it is generic to support new operators.
eg: In getJobWorkerStatus, https://github.com/kubeflow/katib/blob/master/pkg/controller/studyjob/studyjob_controller.go#L421
Related PRs:
The text was updated successfully, but these errors were encountered: