-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploying custom MLflow model - stuck at "Readiness probe failed" #3186
Comments
Do you see any errors in the container logs? |
The seldon-container-engine is showing this error:
The classifier container logs get to this point:
and then getting back to
I also tried with the following deployment file:
but this way the pod wasn't showing at all. |
Did you tried disabling istio sidecar like this? I got this liveness probing problem too although it's 403 not 503 |
@FilipVel Were you able to solve it? I'm stuck at the same issue. apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: rf-regressor
namespace: seldon-system
spec:
annotations:
seldon.io/executor: "false"
predictors:
- graph:
children: []
implementation: MLFLOW_SERVER
modelUri: s3://models/rf-regressor # note: s3 points to minio-seldon in the local kind cluster
envSecretRefName: seldon-rclone-secret
name: rf-regressor
name: default
replicas: 1
componentSpecs:
- spec:
containers:
- name: rf-regressor
livenessProbe:
initialDelaySeconds: 150
failureThreshold: 10
periodSeconds: 50
successThreshold: 1
tcpSocket:
port: 9000
# httpGet:
# path: /health/ping
# port: http
# scheme: HTTP
timeoutSeconds: 3
readinessProbe:
initialDelaySeconds: 150
failureThreshold: 10
periodSeconds: 50
successThreshold: 1
tcpSocket:
port: 9000
# httpGet:
# path: /health/ping
# port: http
# scheme: HTTP
timeoutSeconds: 3 Conda YAML channels:
- conda-forge
dependencies:
- python=3.9.6
- pip
- pip:
- mlflow
- cloudpickle==2.0.0
- scikit-learn==1.0
name: mlflow-env LOGS
|
@sleebapaul there are a couple of things that might be worth checking. The first is that your predictor spec mentions port 9000 but your logs are for port 8082 - that seems inconsistent? If you check the pod spec in Kubernetes, which container uses port 8082? The second is that you might need to increase the |
@agrski I've changed the YAML file a bit. apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: mlflow
namespace: seldon-system
spec:
name: rf-regressor
predictors:
- componentSpecs:
- spec:
# We are setting high failureThreshold as installing conda dependencies
# can take long time and we want to avoid k8s killing the container prematurely
containers:
- name: regressor
livenessProbe:
initialDelaySeconds: 80
failureThreshold: 200
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
readinessProbe:
initialDelaySeconds: 80
failureThreshold: 200
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
graph:
children: []
implementation: MLFLOW_SERVER
modelUri: s3://models/rf-regressor
envSecretRefName: seldon-rclone-secret
name: regressor
name: default
replicas: 1 The
|
In regards ot the readiness probe it seems you have fixed it but you can look at this example for reference https://github.com/SeldonIO/seldon-core/blob/master/servers/mlflowserver/samples/elasticnet_wine.yaml In regards to your second error, this was fixed recently via #3670, so you will have to use the DEV images from MASTER, namely you can just install Seldon Core from master which will configure all 1.12.0-dev images by cloning the repo, and running the helm chart directly from the folder - the below command shows how normally youd set it up with istio:
|
No luck gentlemen. @axsaucedo @agrski Let me jot down every command I've used. curl -L https://istio.io/downloadIstio | sh -
cd istio-1.11.4
export PATH=$PWD/bin:$PATH
istioctl install --set profile=minimal -y
kubectl create namespace seldon
kubectl config set-context $(kubectl config current-context) --namespace=seldon
kubectl create namespace seldon-system
helm install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --set istio.enabled=true --set istio.gateway="seldon-gateway.istio-system.svc.cluster.local" --set usageMetrics.enabled=true --namespace seldon-system
kubectl rollout status deploy/seldon-controller-manager -n seldon-system
helm install seldon-core-analytics seldon-core-analytics --namespace seldon-system --repo https://storage.googleapis.com/seldon-charts --set grafana.adminPassword=password --set grafana.adminUser=admin
git clone https://github.com/SeldonIO/seldon-core/
cd seldon-core
helm upgrade --install seldon-core helm-charts/seldon-core-operator/ --namespace seldon-system --set istio.enabled="true" --set istio.gateway="seldon-gateway.istio-system.svc.cluster.local" --set ambassador.enabled="true"
kubectl create ns minio-system
helm repo add minio https://helm.min.io/
helm install minio minio/minio --set accessKey=minioadmin \
--set secretKey=minioadmin --namespace minio-system
kubectl describe pods --namespace minio-system
export POD_NAME=$(kubectl get pods --namespace minio-system -l "release=minio" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward $POD_NAME 9000 --namespace minio-system
mc config host add minio-local http://localhost:9000 minioadmin minioadmin
mc rb --force minio-local/models
mc mb minio-local/models
mc cp -r experiments/buckets/mlflow/0/<experiment-id>/artifacts/ minio-local/models/
kubectl apply -f seldon-rclone-secret.yaml
kubectl apply -f deploy.yaml ERROR
|
@axsaucedo Following up on the issue, any updates? I'm using Python 3.9. Is that a problem? |
@axsaucedo @agrski The issue was caused by the Python version as I suspected. So I downgraded to Python 3.7.12.
channels:
- conda-forge
dependencies:
- python=3.7.12
- pip
- pip:
- mlflow
- cloudpickle==2.0.0
- scikit-learn==0.23.2
name: mlflow-env
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: mlflow
spec:
name: rf-regressor
predictors:
- componentSpecs:
- spec:
# We are setting high failureThreshold as installing conda dependencies
# can take long time and we want to avoid k8s killing the container prematurely
containers:
- name: regressor
livenessProbe:
initialDelaySeconds: 80
failureThreshold: 200
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
readinessProbe:
initialDelaySeconds: 80
failureThreshold: 200
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
graph:
children: []
implementation: MLFLOW_SERVER
modelUri: s3://models/rf-regressor
envSecretRefName: seldon-rclone-secret
name: regressor
name: default
replicas: 1 Now I've the following issue. The process stops while copying the model. It's been a while since I've been trying to deploy a simple model using |
Perfect - closing given that this has been answered, please reopen if still issue |
Hey, so I have problem deploying a custom MLflow model made with the 'mlflow.pyfunc.model'
The model that I have is as follows:
Apart from the model I have the MLmodel file with the following content:
and conda.yaml
When I try to run the model locally it with
mlflow models serve -m model_name
it works just fine.Next I uploaded the model with the conda.yaml and MLmodel files to a Google cloud bucket that I wanted to use as a source to build a seldon core deployment.
I tried deploying the model with the following code:
The pod then gets the following status:
Next I run 'kubectl describe pod mlflow-default-0-classifier-6f4cc6994-gjn67 -n seldon' where I get the following events:
After googling a bit I tried adding readinessProbe field to the deployment file like this(note that I had to use validate = false):
Even after trying this the same error persists. What could be the problem?
The text was updated successfully, but these errors were encountered: