Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After init helm, install chart failed #149

Closed
jianzi123 opened this issue Nov 15, 2017 · 8 comments
Closed

After init helm, install chart failed #149

jianzi123 opened this issue Nov 15, 2017 · 8 comments

Comments

@jianzi123
Copy link

I can't find the problem..
`[root@master heml_install]# kubectl create -f tf_job.yaml
error: unable to recognize "tf_job.yaml": no matches for tensorflow.org/, Kind=TfJob

[root@master heml_install]# helm ls
NAME REVISION UPDATED STATUS CHART NAMESPACE
tf-job 1 Mon Nov 13 16:09:38 2017 DEPLOYED tf-job-operator-chart-0.2.0-v20171115-a998436 default `
If anyone know the problem, Please tell me, thanks.

@jlewi
Copy link
Contributor

jlewi commented Nov 15, 2017

It looks like the TfJob CRD wasn't created which most likely indicates there was a problem deploying the operator.

Check if the CRD was created

kubectl get crd

You should see something like this

NAME                    AGE
tfjobs.tensorflow.org   12d

If you don't see tfjobs.tensorflow.org then it means the CRD wasn't created.

Check if the operator is running

kubectl get deployment

You should see output like the following

NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
tf-job-operator    1         1         1            1           9d

If the deployment is up, check the pod

kubectl get pods
tf-job-operator-5bc6cb6fd7-5bkck    1/1       Running   4          9d

If the pod isn't running try to figure out why (e.g. by running kubectl describe pods)

If the pod is running get the logs and look for errors. If you share them on this issue we can also help

kubectl logs $POD

@jlewi
Copy link
Contributor

jlewi commented Nov 17, 2017

@jianzi123 Any luck?

@jlewi
Copy link
Contributor

jlewi commented Nov 19, 2017

@jianzi123 I'm going to close this issue for now. If you're still having trouble please feel free to reopen it.

@jlewi jlewi closed this as completed Nov 19, 2017
@jianzi123
Copy link
Author

jianzi123 commented Nov 23, 2017

@jlewi e... i'm sorry for put the logs so late...

[root@master local]# kubectl logs po tf-job-operator-85c95f846d-kp9p6 -n default
Error from server (NotFound): pods "po" not found
[root@master local]# kubectl logs tf-job-operator-85c95f846d-kp9p6 -n default
I1124 02:53:39.625786 1 main.go:69] Loading controller config from /etc/config/controller_config_file.yaml.
F1124 02:53:39.630766 1 main.go:73] Could not read file: /etc/config/controller_config_file.yaml. Error: open /etc/config/controller_config_file.yaml: no such file or directory

[root@master ~]# kubectl describe configmaps tf-job-operator-config
Name: tf-job-operator-config
Namespace: default
Labels:
Annotations:
Data
helm_configmap:
apiVersion: v1
kind: ConfigMap
metadata:
name: tf-job-operator-config
namespace: default
data:
controller_config_file.yaml: |
Events:

@jianzi123 jianzi123 changed the title After init helm and install chart, create tf_job.yaml failed After init helm, install chart failed Nov 24, 2017
@jlewi jlewi reopened this Nov 25, 2017
@jlewi
Copy link
Contributor

jlewi commented Nov 25, 2017

@jianzi123 It looks like your controller_config_file.yaml is empty is that the case? My guess is that in this case no file actually gets created and this is why the read ends up failing.

Can you provide the command line you used to install the helm charter? I'd like to know what value you provided for the config map.

I think one problem is that the helm chart is configured to always require a controller_config_file. I'll create an issue to fix that.

@jlewi
Copy link
Contributor

jlewi commented Nov 25, 2017

Created #175

@jlewi
Copy link
Contributor

jlewi commented Nov 25, 2017

@jianzi123 As a quick fix I suspect if you could try removing the --controller_config_file argument here

@jianzi123
Copy link
Author

I have fix it by reset configmap, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants