Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness probe failed seldon-container-engine while deploying the pipeline #1963

Closed
alokmp83 opened this issue Jun 17, 2020 · 2 comments
Closed

Comments

@alokmp83
Copy link

alokmp83 commented Jun 17, 2020

ENGINE_CONTAINER_IMAGE_AND_VERSION: seldonio/engine:1.1.0
EXECUTOR_CONTAINER_IMAGE_AND_VERSION: seldonio/seldon-core-executor:1.1.0

We are trying to run a model chaining example with 3 images and running the seldon micro-service behind the nginx in the custom image created for all the three containers.

Initially the deployment failed with crashloopbackoff and all the containers failed due to Readiness probe.Then added custom rule for /live and /ready for all the 3 images used in the example and it started working except the seldon-container-engine which got started but kept failing in readiness probe:
Normal Created 2m58s kubelet, Created container seldon-container-engine
Normal Started 2m57s kubelet, Started container seldon-container-engine
Warning Unhealthy 93s (x13 over 2m33s) kubelet, Readiness probe failed: HTTP probe failed with statuscode: 503

{"level":"error","ts":1592360705.6960783,"logger":"SeldonRestApi","msg":"Ready check failed","error":"dial tcp 127.0.0.1:9002: connect: connection refused","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/seldonio/seldon-core/executor/api/rest.(*SeldonRestApi).checkReady\n\t/workspace/api/rest/server.go:198\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2007\ngithub.com/seldonio/seldon-core/executor/api/rest.(*CloudeventHeaderMiddleware).Middleware.func1\n\t/workspace/api/rest/server.go:176\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2007\ngithub.com/seldonio/seldon-core/executor/api/rest.puidHeader.func1\n\t/workspace/api/rest/server.go:191\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2007\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\t/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:212\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2802\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1890"}

Hers is the deploy.yaml:


apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: seldon-pipeline1
spec:
  annotations:
    project_name: seldon-pipeline
    deployment_version: 0.1.0
    seldon.io/rest-read-timeout: '100000'
    seldon.io/rest-connection-timeout: '100000'
    seldon.io/grpc-read-timeout: '100000'
  name: seldon-pipeline
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - name: sentiment-analysis
          image: sentimentanalysis/v1:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 0.1			
              memory: 2Gi
          readinessProbe:
            tcpSocket:
              port: 8080
          livenessProbe:
            tcpSocket:
              port: 8080		  
        - name: text-tagging
          image: tagging/v1:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 0.1			
              memory: 2Gi
          readinessProbe:
            tcpSocket:
              port: 8080
          livenessProbe:
            tcpSocket:
              port: 8080			  
        - name: summarize-text
          image: textsummarize/v1:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 0.1			
              memory: 2Gi
          readinessProbe:
            tcpSocket:
              port: 8080
          livenessProbe:
            tcpSocket:
              port: 8080
        terminationGracePeriodSeconds: 20
    graph:
      children:
      - name: text-tagging
        endpoint:
          type: REST
        type: MODEL
        children:
        - name: summarize-text
          endpoint:
            type: REST
          type: MODEL
          children: []
      name: sentiment-analysis
      endpoint:
        type: REST
      type: MODEL
    svcOrchSpec:
      resources:
        requests:
          cpu: 0.1		
          memory: 6Gi	  
    name: example
    replicas: 1
    annotations:
      predictor_version: v1

Is there a way to pass the /live and /ready for seldon-container same as we pass for the custom images in svcOrchSpec section.
Or Is the root cause something else.

@ukclivecox
Copy link
Contributor

If the ports are defined by you then you need to add them to the graph section. See https://docs.seldon.io/projects/seldon-core/en/latest/examples/protocol_examples.html#Tensorflow-Protocol-REST-Model where service_port is added.

@alokmp83
Copy link
Author

Thanks for the info.

However the problem got resolved by using the name attribute value as 'http' in the containerPort field and each image is listening at 8080 :

predictors:

  • componentSpecs:
    • spec:
      containers:
      • name: sentiment-analysis
        image: <image_name>
        imagePullPolicy: Always
        ports:
        • containerPort: 8080
          name: http

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants