Readiness probe failed seldon-container-engine while deploying the pipeline #1963

alokmp83 · 2020-06-17T05:20:08Z

ENGINE_CONTAINER_IMAGE_AND_VERSION: seldonio/engine:1.1.0
EXECUTOR_CONTAINER_IMAGE_AND_VERSION: seldonio/seldon-core-executor:1.1.0

We are trying to run a model chaining example with 3 images and running the seldon micro-service behind the nginx in the custom image created for all the three containers.

Initially the deployment failed with crashloopbackoff and all the containers failed due to Readiness probe.Then added custom rule for /live and /ready for all the 3 images used in the example and it started working except the seldon-container-engine which got started but kept failing in readiness probe:
Normal Created 2m58s kubelet, Created container seldon-container-engine
Normal Started 2m57s kubelet, Started container seldon-container-engine
Warning Unhealthy 93s (x13 over 2m33s) kubelet, Readiness probe failed: HTTP probe failed with statuscode: 503

{"level":"error","ts":1592360705.6960783,"logger":"SeldonRestApi","msg":"Ready check failed","error":"dial tcp 127.0.0.1:9002: connect: connection refused","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/seldonio/seldon-core/executor/api/rest.(*SeldonRestApi).checkReady\n\t/workspace/api/rest/server.go:198\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2007\ngithub.com/seldonio/seldon-core/executor/api/rest.(*CloudeventHeaderMiddleware).Middleware.func1\n\t/workspace/api/rest/server.go:176\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2007\ngithub.com/seldonio/seldon-core/executor/api/rest.puidHeader.func1\n\t/workspace/api/rest/server.go:191\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2007\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\t/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:212\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2802\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1890"}

Hers is the deploy.yaml:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: seldon-pipeline1
spec:
  annotations:
    project_name: seldon-pipeline
    deployment_version: 0.1.0
    seldon.io/rest-read-timeout: '100000'
    seldon.io/rest-connection-timeout: '100000'
    seldon.io/grpc-read-timeout: '100000'
  name: seldon-pipeline
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - name: sentiment-analysis
          image: sentimentanalysis/v1:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 0.1			
              memory: 2Gi
          readinessProbe:
            tcpSocket:
              port: 8080
          livenessProbe:
            tcpSocket:
              port: 8080		  
        - name: text-tagging
          image: tagging/v1:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 0.1			
              memory: 2Gi
          readinessProbe:
            tcpSocket:
              port: 8080
          livenessProbe:
            tcpSocket:
              port: 8080			  
        - name: summarize-text
          image: textsummarize/v1:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 0.1			
              memory: 2Gi
          readinessProbe:
            tcpSocket:
              port: 8080
          livenessProbe:
            tcpSocket:
              port: 8080
        terminationGracePeriodSeconds: 20
    graph:
      children:
      - name: text-tagging
        endpoint:
          type: REST
        type: MODEL
        children:
        - name: summarize-text
          endpoint:
            type: REST
          type: MODEL
          children: []
      name: sentiment-analysis
      endpoint:
        type: REST
      type: MODEL
    svcOrchSpec:
      resources:
        requests:
          cpu: 0.1		
          memory: 6Gi	  
    name: example
    replicas: 1
    annotations:
      predictor_version: v1

Is there a way to pass the /live and /ready for seldon-container same as we pass for the custom images in svcOrchSpec section.
Or Is the root cause something else.

The text was updated successfully, but these errors were encountered:

ukclivecox · 2020-06-17T06:54:46Z

If the ports are defined by you then you need to add them to the graph section. See https://docs.seldon.io/projects/seldon-core/en/latest/examples/protocol_examples.html#Tensorflow-Protocol-REST-Model where service_port is added.

alokmp83 · 2020-06-19T22:34:03Z

Thanks for the info.

However the problem got resolved by using the name attribute value as 'http' in the containerPort field and each image is listening at 8080 :

predictors:

componentSpecs:
- spec:
  containers:
  - name: sentiment-analysis
    image: <image_name>
    imagePullPolicy: Always
    ports:
    - containerPort: 8080
      name: http

alokmp83 closed this as completed Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readiness probe failed seldon-container-engine while deploying the pipeline #1963

Readiness probe failed seldon-container-engine while deploying the pipeline #1963

alokmp83 commented Jun 17, 2020 •

edited by ukclivecox

Loading

ukclivecox commented Jun 17, 2020

alokmp83 commented Jun 19, 2020

Readiness probe failed seldon-container-engine while deploying the pipeline #1963

Readiness probe failed seldon-container-engine while deploying the pipeline #1963

Comments

alokmp83 commented Jun 17, 2020 • edited by ukclivecox Loading

Hers is the deploy.yaml:

ukclivecox commented Jun 17, 2020

alokmp83 commented Jun 19, 2020

alokmp83 commented Jun 17, 2020 •

edited by ukclivecox

Loading