-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm test tf-job does not pass validation #351
Comments
So there are two issues one is the issue you mentioned and the other is that fact that are tests are reported as passing even though the test is failing. To fix the issue mentioned above we need to specify the program to invoke for the parameter servers because #343 removed the default program. So here we need to add a Template for the PS Replica just like template we have for the master and workers. I'll open a separate issue about why helm test failure isn't properly reported to gubernator. |
I tried adding spec to PS section for https://github.com/tensorflow/k8s/blob/master/examples/tf_job.yaml#L24. It fixed invalid spec issue while creating TFjob |
@karthikvadla Can you take a look at #356 looks like the tests are passing with that PR. |
* Helm test was failing because validation for a tfjob required that replicaSpecs for a Parameter server specify a template. Helm test failure also was not reported. Changes made: * Updated e2e tests and examples to include a template for the PS replicaSpec * Check for None before concatenating the error. * Fixes #351 and #355
I'm on gke with Kubernetes 1.8.6.
I have found that helm test e2e test does not pass validation.
Here are the steps I did to recreate:
It also appears to be fail validation for the example:
It looks like this might have been introduced in https://github.com/tensorflow/k8s/pull/343/files.
I would like to fix this but I'm not sure if the fix involved specifying a template for the parameter server in the example (where I would probably use the previous default), or if the validation is supposed to allow the parameter server replica to omit the template.
The text was updated successfully, but these errors were encountered: