-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fail to run example job. invalid job spec: tfReplicaSpec.TfPort can''t be nil #284
Comments
It is a bug, and you could add TfPort into the YAML:
|
thanks for the response @gaocegege . |
@GoodJoey Sorry for that, it seems that our code is broken now. I am not sure when all of them will be fixed. Maybe you could try it on the commit https://github.com/tensorflow/k8s/tree/430cf179ba9c1ce4a134d3800f871dbbb0c73da1 |
@gaocegege |
you mean use commit 430cf17 to build a new tf_operator docker image, right? |
Yeah, I am not sure if it works but I think it does. I am asking @jlewi to release a stable version for users here: #280 (comment) |
seems need to set up go environment? any quick scripts to do that? |
Sorry I also know little about the release process of the operator. If you are not urgent, maybe you could wait for these PRs, when they are merged I think the problems will be solved. |
Sorry for the slow reply. The latest GCS link now points to an old stable release. |
We should be setting a default port here which is called from some of the informer generated code. Looks like that's not happening. Any ideas why? /cc @gaocegege @wackxu |
i tried kubectl create -f https://raw.githubusercontent.com/tensorflow/k8s/master/examples/tf_job.yaml
with command kubectl get TfJob -o yaml
get output:
kind: TfJob
metadata:
clusterName: ""
creationTimestamp: 2018-01-11T08:34:46Z
generation: 0
name: example-job
namespace: default
resourceVersion: "30201"
selfLink: /apis/tensorflow.org/v1alpha1/namespaces/default/tfjobs/example-job
uid: 47e9ba9d-f6aa-11e7-baad-4ccc6ab8f7ad
spec:
RuntimeId: ""
replicaSpecs:
replicas: 1
template:
metadata:
creationTimestamp: null
spec:
containers:
- image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
name: tensorflow
resources: {}
restartPolicy: OnFailure
tfReplicaType: MASTER
replicas: 1
template:
metadata:
creationTimestamp: null
spec:
containers:
- image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
name: tensorflow
resources: {}
restartPolicy: OnFailure
tfReplicaType: WORKER
replicas: 2
tfReplicaType: PS
tensorboard: null
status:
phase: Failed
reason: 'invalid job spec: tfReplicaSpec.TfPort can''t be nil.'
replicaStatuses: null
state: Failed
anyone knows what's happening here? Thanks!
The text was updated successfully, but these errors were encountered: