Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] TFJob launcher pipeline task fails when delete_finished_tfjob flag is True #7984

Closed
eundonglee opened this issue Jul 6, 2022 · 3 comments · Fixed by #7985
Closed

Comments

@eundonglee
Copy link
Contributor

eundonglee commented Jul 6, 2022

image: nikenano/launchernew:latest

When delete_finished_tfjob flag is True, TFJob launcher task fails with error below.

Traceback (most recent call last):
  File "/ml/launch_tfjob.py", line 136, in <module>
    main()
  File "/ml/launch_tfjob.py", line 133, in main
    tfjob.delete(args.name, args.namespace)
  File "/ml/launch_crd.py", line 115, in delete
    body)
TypeError: delete_namespaced_custom_object() takes exactly 6 arguments (7 given)

I think it's some kind of kubernetes client SDK version issue in nikenano/launchernew:latest container image.

@eundonglee eundonglee changed the title TFJob launcher task fails when delete_finished_tfjob flag is True [bug] TFJob launcher task fails when delete_finished_tfjob flag is True Jul 6, 2022
@eundonglee eundonglee changed the title [bug] TFJob launcher task fails when delete_finished_tfjob flag is True [bug] TFJob launcher pipeline task fails when delete_finished_tfjob flag is True Jul 6, 2022
@eundonglee
Copy link
Contributor Author

I found this PR in kubeflow/training-operator repo about similar problem.
kubeflow/training-operator#1281

I think those changes above should be applied here and also to nikenano/launchernew container image.

api_response = self.client.delete_namespaced_custom_object(
self.group,
self.version,
namespace,
self.plural,
name,
body)

@streamnsight
Copy link

The code might have been fixed, but the image in the component.yaml has not been updated so the changes are not propagated.

@sourabh-raja-murali
Copy link

We have a usecase where we want to delete the TF job as the TF job holds on to the PVC which also needs to be deleted upon TFJob completion. I could not find the updated image with the fix. Can the image with the fix be shared please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants