-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In TFjob, when the workers Completed, i want the ps Completed too, how can i do? #657
Comments
I do not think you could do it, because PS is designed to be long running in TF |
but the ps belongs to one tfjob, and when i create a new tfjob we will create new ps, according my understand when a tfjob compled we will delete all resources the tfjob used, right? |
We do not implement the logic to delete all pods and services after the TFJob is completed in v1alpha2. And even if we implement it, we do not set the ps to be completed, we just delete it. |
which one will delete the ps server when TFjob completed? mster? |
Thank you for your reply. |
controller for v1alpha1 will do it. |
what about v1alpha2? |
We will investigate if there is another way to do the same thing without deletions. Ref #661 |
You could use features in #691 , to control the behaviour. |
and for ps:
server = server = tf.train.Server(cluster, job_name=local_type, task_index=local_index)
...
if local_type == "ps":
print "local server is ps"
server.join()
The text was updated successfully, but these errors were encountered: