You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently tf-operator repository has one example of distributed TFJob: https://github.com/kubeflow/tf-operator/tree/master/examples/tf_sample. I think adding more examples of distributed learning tasks would be helpful because the existing example is artificial and its configuration wouldn't a typical distributed learning task of TF (the number of PSes is larger than the number of the workers). The examples would be a good resource for learning how TFJob object should be defined for typical learning tasks.
I'm glad if I can hear opinions from the community about this idea.
The text was updated successfully, but these errors were encountered:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Currently tf-operator repository has one example of distributed TFJob: https://github.com/kubeflow/tf-operator/tree/master/examples/tf_sample. I think adding more examples of distributed learning tasks would be helpful because the existing example is artificial and its configuration wouldn't a typical distributed learning task of TF (the number of PSes is larger than the number of the workers). The examples would be a good resource for learning how TFJob object should be defined for typical learning tasks.
I'm glad if I can hear opinions from the community about this idea.
The text was updated successfully, but these errors were encountered: