-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU test marked as succeeded but airflow step is failing #240
Comments
Here's the value of
|
When I manually ran the step to create the cluster I noticed that it completed before the controller pod had started
So I'm guessing we try to create the TfJob before the CRD is actually created and this is why we get a 404. Potential fixes
|
jlewi
added a commit
to jlewi/k8s
that referenced
this issue
Feb 28, 2018
* We need to stop pinning GKE version to 1.8.5 because that is no longer a valid version. * We should no longer need to pin because 1.8 is now the default. * Fix some lint issues that seem to have crept in. Fix kubeflow#240
Merged
jlewi
added a commit
that referenced
this issue
Feb 28, 2018
* We need to stop pinning GKE version to 1.8.5 because that is no longer a valid version. * We should no longer need to pin because 1.8 is now the default. * Fix some lint issues that seem to have crept in. Fix #240
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In our Airflow graph, the run_gpu_test isn't succeeding.
But the test is still reported as succeeded in Gubernator. The junit file
https://storage.googleapis.com/kubernetes-jenkins/logs/tf-k8s-periodic/174/artifacts/junit_gpu-tests.xml
exists and reports success.
So there are two issues here
The text was updated successfully, but these errors were encountered: