Skip to content
This repository has been archived by the owner on Dec 1, 2021. It is now read-only.

predict fails and seldondeployment missing .status #35

Open
DavidLangworthy opened this issue Apr 19, 2019 · 13 comments
Open

predict fails and seldondeployment missing .status #35

DavidLangworthy opened this issue Apr 19, 2019 · 13 comments

Comments

@DavidLangworthy
Copy link

@cliveseldon
Calling predict on a deployment that returned sucess fails with a connection error. Attempting to debug this reveals that .status is missing from seldondeployment. Sugestions for how to debug this?

!kubectl get seldondeployments mnist-classifier -o jsonpath='{.status}'

returns nothing

!kubectl get seldondeployments mnist-classifier -o json
returns
{
"apiVersion": "machinelearning.seldon.io/v1alpha2",
"kind": "SeldonDeployment",
"metadata": {
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{"apiVersion":"machinelearning.seldon.io/v1alpha2","kind":"SeldonDeployment","metadata":{"annotations":{},"labels":{"app":"seldon"},"name":"mnist-classifier","namespace":"kubeflow"},"spec":{"annotations":{"deployment_version":"v1","project_name":"MNIST Example","seldon.io/engine-separate-pod":"false","seldon.io/rest-connection-timeout":"100"},"name":"mnist-classifier","predictors":[{"annotations":{"predictor_version":"v1"},"componentSpecs":[{"spec":{"containers":[{"image":"seldonio/deepmnistclassifier_runtime:0.2","imagePullPolicy":"Always","name":"tf-model","volumeMounts":[{"mountPath":"/data","name":"persistent-storage"}]}],"terminationGracePeriodSeconds":1,"volumes":[{"name":"persistent-storage","volumeSource":{"persistentVolumeClaim":{"claimName":"nfs-1"}}}]}}],"graph":{"children":[],"endpoint":{"type":"REST"},"name":"tf-model","type":"MODEL"},"name":"mnist-classifier","replicas":1}]}}\n"
},
"creationTimestamp": "2019-04-18T21:26:32Z",
"generation": 1,
"labels": {
"app": "seldon"
},
"name": "mnist-classifier",
"namespace": "kubeflow",
"resourceVersion": "128631",
"selfLink": "/apis/machinelearning.seldon.io/v1alpha2/namespaces/kubeflow/seldondeployments/mnist-classifier",
"uid": "a3450e71-6220-11e9-a023-da0ed60f5a55"
},
"spec": {
"annotations": {
"deployment_version": "v1",
"project_name": "MNIST Example",
"seldon.io/engine-separate-pod": "false",
"seldon.io/rest-connection-timeout": "100"
},
"name": "mnist-classifier",
"predictors": [
{
"annotations": {
"predictor_version": "v1"
},
"componentSpecs": [
{
"spec": {
"containers": [
{
"image": "seldonio/deepmnistclassifier_runtime:0.2",
"imagePullPolicy": "Always",
"name": "tf-model",
"volumeMounts": [
{
"mountPath": "/data",
"name": "persistent-storage"
}
]
}
],
"terminationGracePeriodSeconds": 1,
"volumes": [
{
"name": "persistent-storage",
"volumeSource": {
"persistentVolumeClaim": {
"claimName": "nfs-1"
}
}
}
]
}
}
],
"graph": {
"children": [],
"endpoint": {
"type": "REST"
},
"name": "tf-model",
"type": "MODEL"
},
"name": "mnist-classifier",
"replicas": 1
}
]
}
}

@ukclivecox
Copy link
Contributor

Can you check the logs of the cluster-manager and check the pods are running. There should always be a status so need to track this down further.

@DavidLangworthy
Copy link
Author

What specifically do I need to look for? Kubeflow starts up so much it's hard to find my way around.

@DavidLangworthy
Copy link
Author

!kubectl get pods -n kubeflow

NAME READY STATUS RESTARTS AGE
ambassador-c9647fb66-fl4zr 1/1 Running 0 1d
ambassador-c9647fb66-g6n9r 1/1 Running 0 1d
ambassador-c9647fb66-z7p27 1/1 Running 0 1d
argo-ui-755fcfc656-s2rgl 1/1 Running 0 1d
centraldashboard-7c948d9df6-jh8zj 1/1 Running 0 1d
jupyter-0 1/1 Running 0 1d
jupyter-web-app-6ffc57d749-mqtgr 0/1 CrashLoopBackOff 318 1d
katib-ui-6dc644d54-jg6mj 1/1 Running 0 1d
kubeflow-r-train-srxtq-1399384440 0/1 Completed 0 23h
kubeflow-sk-train-6llnn-122502152 0/1 Completed 0 23h
kubeflow-tf-train-nc5kg-1269457206 0/1 Completed 0 23h
metacontroller-0 1/1 Running 0 1d
minio-b7595688d-4xhbq 1/1 Running 0 1d
ml-pipeline-59459675dd-npjh6 1/1 Running 0 1d
ml-pipeline-persistenceagent-7f6d4555d7-hdkmn 1/1 Running 1 1d
ml-pipeline-scheduledworkflow-5f4d44fb4f-65xt9 1/1 Running 0 1d
ml-pipeline-ui-f5d595697-z8cl5 1/1 Running 0 1d
ml-pipeline-viewer-controller-deployment-5b4954fb4c-4ldm8 1/1 Running 0 1d
mnist-train-5-worker-0 0/1 Completed 0 23h
mykubeflowapp2-controller-b5677fccf-5fpsm 1/1 Running 0 1d
mysql-5b7578d9f5-8mjld 1/1 Running 0 1d
notebooks-controller-9c5f6b7f5-t2xlh 1/1 Running 0 1d
profiles-7bfcbd5f76-2ht9w 1/1 Running 0 1d
pytorch-operator-847d884f4d-cvwpm 1/1 Running 0 1d
r-train-mfs75 0/1 Completed 0 23h
sk-train-svnwb 0/1 Completed 0 23h
spartakus-volunteer-7787b4cf54-z79tj 1/1 Running 0 1d
studyjob-controller-5995857687-46xrn 1/1 Running 0 1d
tf-job-dashboard-c899cd664-94wtf 1/1 Running 0 1d
tf-job-operator-785546f859-rfzrm 1/1 Running 0 1d
vizier-core-6d56d75f76-969ks 1/1 Running 3 1d
vizier-core-rest-79bdbfbfb8-qnvz9 1/1 Running 0 1d
vizier-db-79d57d5667-f7nst 1/1 Running 0 1d
vizier-suggestion-bayesianoptimization-759f6c56c8-54p6x 1/1 Running 0 1d
vizier-suggestion-grid-59f7f5646d-fqcfg 1/1 Running 0 1d
vizier-suggestion-hyperband-84b8ddc658-xm9fb 1/1 Running 0 1d
vizier-suggestion-random-64b4467f6b-gptpl 1/1 Running 0 1d
workflow-controller-8564bd964f-df7x2 1/1 Running 0 1d

@ukclivecox
Copy link
Contributor

I don't see the seldon cluster-manager. Did you install seldon as per the docs?

@DavidLangworthy
Copy link
Author

Yes, but I gather it was not successful. I will try again.

Thank you

@DavidLangworthy
Copy link
Author

The deployment worked this time and the cluster manager is up:
dlan@loadclient:~$ kubectl get pods --all-namespaces | grep seldon
kube-system seldon-spartakus-volunteer-57647c7679-vb6pt 1/1 Running 0 1d kubeflow seldon-core-ambassador-6bb6fb974d-qwg79 1/1 Running 0 1m
kubeflow seldon-core-redis-685dd67c95-grv2h 1/1
Running 0 1m
kubeflow seldon-core-seldon-cluster-manager-dd8497ccf-xtm46 1/1
Running 0 1m

However I am still getting an error calling the prediction service.

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

The port forward window gives me the following:

dlan@loadclient:~$ kubectl port-forward $(kubectl get pods -n kubeflow -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n kubeflow 8002:80
Forwarding from 127.0.0.1:8002 -> 80
Forwarding from [::1]:8002 -> 80
Handling connection for 8002
E0419 21:38:55.183309 12957 portforward.go:400] an error occurred forwarding 8002 -> 80: error forwarding port 80 to pod baa7cdd3e0fc3d4ce1d30ff49cd8602421ebce99f6895fdb5aa70e1e362051f9, uid : exit status 1: 2019/04/19 21:38:55 socat[9620] E connect(6, AF=2 127.0.0.1:80, 16): Connection refused
Handling connection for 8002
E0419 21:38:58.731598 12957 portforward.go:400] an error occurred forwarding 8002 -> 80: error forwarding port 80 to pod baa7cdd3e0fc3d4ce1d30ff49cd8602421ebce99f6895fdb5aa70e1e362051f9, uid : exit status 1: 2019/04/19 21:38:58 socat[9798] E connect(6, AF=2 127.0.0.1:80, 16): Connection refused
Handling connection for 8002
E0419 21:39:27.769533 12957 portforward.go:400] an error occurred forwarding 8002 -> 80: error forwarding port 80 to pod baa7cdd3e0fc3d4ce1d30ff49cd8602421ebce99f6895fdb5aa70e1e362051f9, uid : exit status 1: 2019/04/19 21:39:27 socat[10904] E connect(6, AF=2 127.0.0.1:80, 16): Connection refused

@ukclivecox
Copy link
Contributor

OK. Can you check the Ambassador exposes port 80 or has moved to 8080 now?

@DavidLangworthy
Copy link
Author

I have two ambassadors
ambassador ClusterIP 10.0.233.236 80/TCP
seldon-core-ambassador NodePort 10.0.158.182 80:30489/TCP, 443:31294/TCP

Thanks for your help.

@ukclivecox
Copy link
Contributor

I would try connecting to both Ambassadors directly to see which ones work and also check the Ambassador diagnostics.

@DavidLangworthy
Copy link
Author

DavidLangworthy commented Apr 23, 2019 via email

@DavidLangworthy
Copy link
Author

I can hit the predictor directly and it works fine. The routes look fine in ambassador. However I do not see requests in the ambassador logs.

Any suggestions?

I'll keep looking around.

@ukclivecox
Copy link
Contributor

Sorry, missed this. You won't see requests in the Ambassador logs by default I think as Ambassador doesn't logs every request. Are the requests working?

@DavidLangworthy
Copy link
Author

The requests were not working. I've recycled this cluster. I'll bring up a fresh one and see if there is a repro.

Thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants