We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
See kubeflow/kubeflow#1283
On TF <= 1.6 if you try to use the estimator API with a master the job hangs with
The error is the same "tensorflow:Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized:"
The TF_CONIFG looks
TF_CONFIG={"cluster":{"master":["trainer-tfjob2-master-0.default.svc.cluster.local:2222"]},"task":{"type":"master","index":0}}
Its missing Environment which is used in TF 1.6 See here https://github.com/tensorflow/tensorflow/blob/r1.6/tensorflow/contrib/learn/python/learn/estimators/run_config.py#L146
TF 1.8 doesn't use Environment but it looks like it expects the Chief type. https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/estimator/run_config.py#L489
See here: https://github.com/kubeflow/tf-operator/blob/master/pkg/trainer/replicas.go#L72
The text was updated successfully, but these errors were encountered:
Could we close the issue? I think we fixed it in #766
Sorry, something went wrong.
No branches or pull requests
See kubeflow/kubeflow#1283
On TF <= 1.6 if you try to use the estimator API with a master the job hangs with
The TF_CONIFG looks
Its missing Environment which is used in TF 1.6 See here
https://github.com/tensorflow/tensorflow/blob/r1.6/tensorflow/contrib/learn/python/learn/estimators/run_config.py#L146
TF 1.8 doesn't use Environment but it looks like it expects the Chief type.
https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/estimator/run_config.py#L489
See here:
https://github.com/kubeflow/tf-operator/blob/master/pkg/trainer/replicas.go#L72
The text was updated successfully, but these errors were encountered: