Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1alpha2] Pods not deleted when job finishes #671

Closed
jlewi opened this issue Jun 15, 2018 · 4 comments
Closed

[v1alpha2] Pods not deleted when job finishes #671

jlewi opened this issue Jun 15, 2018 · 4 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jun 15, 2018

Doesn't look like the pods are being deleted when the job finishes.

This means the pods will still be running and continue to consume resources.

For v1alpha1 we decided that was undesirable and that pods should be deleted when the job finishes.

I think we should preserve that behavior even though it means access to logs depends on cluster level logging.

If a user wants to leave pods running until job finishes they could do this by keeping the processes alive via their code.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 15, 2018

/assign @gaocegege

@gaocegege
Copy link
Member

Is this for v1alpha1 or v1alpha2?

@jlewi
Copy link
Contributor Author

jlewi commented Jun 15, 2018

v1alpha2.

Here's the issue related to changing the behavior in v1alpha1 to delete pods when job finishes.
#128

Per that issue, I think this is a real issue for using TFJob. If you launch a job which uses GPUs then that job will continue to consume GPUs even after the job finishes.

I think this is a major barrier to doing actual work.

Bumping this to P0 for this reason and to match priority with which we rated it in v1alpha1.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 19, 2018

@yph152 and @gaocegege Any update on when the PR for this issue will be ready? Do you need help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants