-
Notifications
You must be signed in to change notification settings - Fork 57
predict fails and seldondeployment missing .status #35
Comments
Can you check the logs of the cluster-manager and check the pods are running. There should always be a status so need to track this down further. |
What specifically do I need to look for? Kubeflow starts up so much it's hard to find my way around. |
!kubectl get pods -n kubeflow NAME READY STATUS RESTARTS AGE |
I don't see the seldon cluster-manager. Did you install seldon as per the docs? |
Yes, but I gather it was not successful. I will try again. Thank you |
The deployment worked this time and the cluster manager is up: However I am still getting an error calling the prediction service. ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) The port forward window gives me the following: dlan@loadclient:~$ kubectl port-forward $(kubectl get pods -n kubeflow -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n kubeflow 8002:80 |
OK. Can you check the Ambassador exposes port 80 or has moved to 8080 now? |
I have two ambassadors Thanks for your help. |
I would try connecting to both Ambassadors directly to see which ones work and also check the Ambassador diagnostics. |
I can hit the predictor directly and it works fine. The routes look fine in ambassador. However I do not see requests in the ambassador logs. Any suggestions? I'll keep looking around. |
Sorry, missed this. You won't see requests in the Ambassador logs by default I think as Ambassador doesn't logs every request. Are the requests working? |
The requests were not working. I've recycled this cluster. I'll bring up a fresh one and see if there is a repro. Thank you. |
@cliveseldon
Calling predict on a deployment that returned sucess fails with a connection error. Attempting to debug this reveals that .status is missing from seldondeployment. Sugestions for how to debug this?
!kubectl get seldondeployments mnist-classifier -o jsonpath='{.status}'
returns nothing
!kubectl get seldondeployments mnist-classifier -o json
returns
{
"apiVersion": "machinelearning.seldon.io/v1alpha2",
"kind": "SeldonDeployment",
"metadata": {
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{"apiVersion":"machinelearning.seldon.io/v1alpha2","kind":"SeldonDeployment","metadata":{"annotations":{},"labels":{"app":"seldon"},"name":"mnist-classifier","namespace":"kubeflow"},"spec":{"annotations":{"deployment_version":"v1","project_name":"MNIST Example","seldon.io/engine-separate-pod":"false","seldon.io/rest-connection-timeout":"100"},"name":"mnist-classifier","predictors":[{"annotations":{"predictor_version":"v1"},"componentSpecs":[{"spec":{"containers":[{"image":"seldonio/deepmnistclassifier_runtime:0.2","imagePullPolicy":"Always","name":"tf-model","volumeMounts":[{"mountPath":"/data","name":"persistent-storage"}]}],"terminationGracePeriodSeconds":1,"volumes":[{"name":"persistent-storage","volumeSource":{"persistentVolumeClaim":{"claimName":"nfs-1"}}}]}}],"graph":{"children":[],"endpoint":{"type":"REST"},"name":"tf-model","type":"MODEL"},"name":"mnist-classifier","replicas":1}]}}\n"
},
"creationTimestamp": "2019-04-18T21:26:32Z",
"generation": 1,
"labels": {
"app": "seldon"
},
"name": "mnist-classifier",
"namespace": "kubeflow",
"resourceVersion": "128631",
"selfLink": "/apis/machinelearning.seldon.io/v1alpha2/namespaces/kubeflow/seldondeployments/mnist-classifier",
"uid": "a3450e71-6220-11e9-a023-da0ed60f5a55"
},
"spec": {
"annotations": {
"deployment_version": "v1",
"project_name": "MNIST Example",
"seldon.io/engine-separate-pod": "false",
"seldon.io/rest-connection-timeout": "100"
},
"name": "mnist-classifier",
"predictors": [
{
"annotations": {
"predictor_version": "v1"
},
"componentSpecs": [
{
"spec": {
"containers": [
{
"image": "seldonio/deepmnistclassifier_runtime:0.2",
"imagePullPolicy": "Always",
"name": "tf-model",
"volumeMounts": [
{
"mountPath": "/data",
"name": "persistent-storage"
}
]
}
],
"terminationGracePeriodSeconds": 1,
"volumes": [
{
"name": "persistent-storage",
"volumeSource": {
"persistentVolumeClaim": {
"claimName": "nfs-1"
}
}
}
]
}
}
],
"graph": {
"children": [],
"endpoint": {
"type": "REST"
},
"name": "tf-model",
"type": "MODEL"
},
"name": "mnist-classifier",
"replicas": 1
}
]
}
}
The text was updated successfully, but these errors were encountered: