rest-proxy container failing when enabling gRPC inferencing #401

heyselbi · 2023-07-11T19:00:28Z

Describe the bug

I would like to enable gRPC inferencing with modelmesh. When I follow instructions [1] and [2], I am able to do successful grpcurl request. However, I noticed that the rest-proxy container starts failing.

To Reproduce
Steps to reproduce the behavior:

Install modelmesh in a specific namespace (opendatahub in my case)
Create secret (ie. mm-new) following the prompts here. Inferencing namespace name is modelmesh-serving. Apply the secret:

apiVersion: v1
metadata:
  name: mm-new
  namespace: opendatahub
data:
  tls.crt: <hidden>
  tls.key: <hidden>
type: kubernetes.io/tls

Apply custom configmap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: model-serving-config
  namespace: opendatahub
data:
  config.yaml: |
    tls:
      secretName: mm-new

Apply runtime and inference service in a separate namespace (modelmesh-serving in my case)

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  annotations:
    enable-route: 'true'
  namespace: modelmesh-serving
  labels:
    name: modelmesh-serving-ovms-1.x-SR
spec:
  builtInAdapter:
    memBufferBytes: 134217728
    modelLoadingTimeoutMillis: 90000
    runtimeManagementPort: 8888
    serverType: ovms
  containers:
    - args:
        - '--port=8001'
        - '--rest_port=8888'
        - '--config_path=/models/model_config_list.json'
        - '--file_system_poll_wait_seconds=0'
        - '--grpc_bind_address=127.0.0.1'
        - '--rest_bind_address=127.0.0.1'
      image: 'quay.io/opendatahub/openvino_model_server:2022.3-release'
      name: ovms
      resources:
        limits:
          cpu: 5
          memory: 1Gi
        requests:
          cpu: 500m
          memory: 1Gi
  grpcDataEndpoint: 'port:8001'
  grpcEndpoint: 'port:8085'
  multiModel: true
  protocolVersions:
    - grpc-v1
  supportedModelFormats:
    - autoSelect: true
      name: openvino_ir
      version: opset1
    - name: onnx
      version: '1'

Inference service:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
  name: example-onnx-mnist
  namespace: modelmesh-serving
spec:
  predictor:
    model:
      modelFormat:
        name: onnx
      runtime: ovms-1.x
      storage:
        key: localMinIO
        path: onnx/mnist.onnx

Status of isvc after is:

status:
  components:
    predictor:
      grpcUrl: 'grpc://modelmesh-serving.modelmesh-serving:8033'
      restUrl: 'https://modelmesh-serving.modelmesh-serving:8008'
      url: 'grpc://modelmesh-serving.modelmesh-serving:8033'
  conditions:
    - lastTransitionTime: '2023-07-10T13:22:17Z'
      status: 'True'
      type: PredictorReady
    - lastTransitionTime: '2023-07-10T13:22:17Z'
      status: 'True'
      type: Ready
  modelStatus:
    copies:
      failedCopies: 0
      totalCopies: 1
    states:
      activeModelState: Loaded
      targetModelState: ''
    transitionStatus: UpToDate
  url: 'grpc://modelmesh-serving.modelmesh-serving:8033'

Route is automatically created or can be applied manually too:

kind: Route
apiVersion: route.openshift.io/v1
metadata:
  annotations:
    openshift.io/host.generated: 'true'
  name: example-onnx-mnist-grpc
  namespace: modelmesh-serving
  ownerReferences:
    - apiVersion: serving.kserve.io/v1beta1
      kind: InferenceService
      name: example-onnx-mnist
      uid: ce3c4bd6-43ea-4e97-b7b5-43ba909e1adc
      controller: true
      blockOwnerDeletion: true
  labels:
    inferenceservice-name: example-onnx-mnist
spec:
  host: >-
    <modelname>-grpc-<namespace>.apps.<clusterdetails>.com
  to:
    kind: Service
    name: modelmesh-serving
    weight: 100
  port:
    targetPort: 8033
  tls:
    termination: passthrough
    insecureEdgeTerminationPolicy: Redirect
  wildcardPolicy: None

Status of the route:

status:
  ingress:
    - host: >-
        <modelname>-grpc-<namespace>.apps.<clusterdetails>.com
      routerName: default
      conditions:
        - type: Admitted
          status: 'True'
          lastTransitionTime: '2023-07-10T13:19:58Z'
      wildcardPolicy: None
      routerCanonicalHostname: router-default.apps.<clusterdetails>.com

Go to Pods and select the modelmesh-serving-ovms-1.x-* pod and go to rest-proxy logs. This is what it shows:

{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Starting REST Proxy..."}
{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Using TLS"}
{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Registering gRPC Inference Service Handler","Host":"localhost","Port":8033,"MaxCallRecvMsgSize":16777216}
{"level":"info","ts":"2023-07-10T15:31:19Z","msg":"Listening on port 8008 with TLS"}
2023/07/10 15:31:23 http: TLS handshake error from <IP1>:50510: read tcp <IP3>:8008-><IP1>:50510: read: connection reset by peer
2023/07/10 15:31:23 http: TLS handshake error from <IP2>:47526: read tcp <IP3>:8008-><IP2>:47526: read: connection reset by peer
2023/07/10 15:31:28 http: TLS handshake error from <IP1>:50518: read tcp <IP3>:8008-><IP1>:50518: read: connection reset by peer

Error continues in repeating fashion until inference service is deleted (which deletes the serving runtime pod as well).
Interestingly, if one goes to the other pod (replicaSet is set to 2) and checks the rest-proxy logs, it has no error:

{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Starting REST Proxy..."}
{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Using TLS"}
{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Registering gRPC Inference Service Handler","Host":"localhost","Port":8033,"MaxCallRecvMsgSize":16777216}
{"level":"info","ts":"2023-07-10T13:20:05Z","msg":"Listening on port 8008 with TLS"}

For reference, sharing the runtime Deployment yaml:

apiVersion: apps/v1
metadata:
  annotations:
    deployment.kubernetes.io/revision: '2'
  namespace: modelmesh-serving
  labels:
    app.kubernetes.io/instance: modelmesh-controller
    app.kubernetes.io/managed-by: modelmesh-controller
    app.kubernetes.io/name: modelmesh-controller
    modelmesh-service: modelmesh-serving
    name: modelmesh-serving-ovms-1.x
spec:
  replicas: 2
  selector:
    matchLabels:
      modelmesh-service: modelmesh-serving
      name: modelmesh-serving-ovms-1.x
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: modelmesh-controller
        app.kubernetes.io/managed-by: modelmesh-controller
        app.kubernetes.io/name: modelmesh-controller
        modelmesh-service: modelmesh-serving
        name: modelmesh-serving-ovms-1.x
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: '2112'
        prometheus.io/scheme: https
        prometheus.io/scrape: 'true'
    spec:
      restartPolicy: Always
      serviceAccountName: modelmesh-serving-sa
      schedulerName: default-scheduler
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values:
                      - amd64
      terminationGracePeriodSeconds: 90
      securityContext: {}
      containers:
        - resources:
            limits:
              cpu: '1'
              memory: 512Mi
            requests:
              cpu: 50m
              memory: 96Mi
          terminationMessagePath: /dev/termination-log
          name: rest-proxy
          env:
            - name: REST_PROXY_LISTEN_PORT
              value: '8008'
            - name: REST_PROXY_GRPC_PORT
              value: '8033'
            - name: REST_PROXY_USE_TLS
              value: 'true'
            - name: REST_PROXY_GRPC_MAX_MSG_SIZE_BYTES
              value: '16777216'
            - name: MM_TLS_KEY_CERT_PATH
              value: /opt/kserve/mmesh/tls/tls.crt
            - name: MM_TLS_PRIVATE_KEY_PATH
              value: /opt/kserve/mmesh/tls/tls.key
          ports:
            - name: http
              containerPort: 8008
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: tls-certs
              readOnly: true
              mountPath: /opt/kserve/mmesh/tls
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/rest-proxy:v0.10.0'
        - resources:
            limits:
              cpu: 100m
              memory: 256Mi
            requests:
              cpu: 100m
              memory: 256Mi
          readinessProbe:
            httpGet:
              path: /oauth/healthz
              port: 8443
              scheme: HTTPS
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: oauth-proxy
          livenessProbe:
            httpGet:
              path: /oauth/healthz
              port: 8443
              scheme: HTTPS
            initialDelaySeconds: 30
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          ports:
            - name: https
              containerPort: 8443
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: proxy-tls
              mountPath: /etc/tls/private
          terminationMessagePolicy: File
          image: >-
            registry.redhat.io/openshift4/ose-oauth-proxy@sha256:4bef31eb993feb6f1096b51b4876c65a6fb1f4401fee97fa4f4542b6b7c9bc46
          args:
            - '--https-address=:8443'
            - '--provider=openshift'
            - '--openshift-service-account="modelmesh-serving-sa"'
            - '--upstream=http://localhost:8008'
            - '--tls-cert=/etc/tls/private/tls.crt'
            - '--tls-key=/etc/tls/private/tls.key'
            - '--cookie-secret=SECRET'
            - >-
              --openshift-delegate-urls={"/": {"namespace": "modelmesh-serving",
              "resource": "services", "verb": "get"}}
            - >-
              --openshift-sar={"namespace": "modelmesh-serving", "resource":
              "services", "verb": "get"}
            - '--skip-auth-regex=''(^/metrics|^/apis/v1beta1/healthz)'''
        - resources:
            limits:
              cpu: '5'
              memory: 1Gi
            requests:
              cpu: 500m
              memory: 1Gi
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              httpGet:
                path: /prestop
                port: 8090
                scheme: HTTP
          name: ovms
          securityContext:
            capabilities:
              drop:
                - ALL
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: models-dir
              mountPath: /models
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/openvino_model_server:2022.3-release'
          args:
            - '--port=8001'
            - '--rest_port=8888'
            - '--config_path=/models/model_config_list.json'
            - '--file_system_poll_wait_seconds=0'
            - '--grpc_bind_address=127.0.0.1'
            - '--rest_bind_address=127.0.0.1'
        - resources:
            limits:
              cpu: '2'
              memory: 512Mi
            requests:
              cpu: 50m
              memory: 96Mi
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              httpGet:
                path: /prestop
                port: 8090
                scheme: HTTP
          name: ovms-adapter
          command:
            - /opt/app/ovms-adapter
          env:
            - name: ADAPTER_PORT
              value: '8085'
            - name: RUNTIME_PORT
              value: '8888'
            - name: RUNTIME_DATA_ENDPOINT
              value: 'port:8001'
            - name: CONTAINER_MEM_REQ_BYTES
              valueFrom:
                resourceFieldRef:
                  containerName: ovms
                  resource: requests.memory
                  divisor: '0'
            - name: MEM_BUFFER_BYTES
              value: '134217728'
            - name: LOADTIME_TIMEOUT
              value: '90000'
            - name: USE_EMBEDDED_PULLER
              value: 'true'
            - name: RUNTIME_VERSION
              value: 2022.3-release
          securityContext:
            capabilities:
              drop:
                - ALL
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: models-dir
              mountPath: /models
            - name: storage-config
              readOnly: true
              mountPath: /storage-config
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/modelmesh-runtime-adapter:v0.11.0-alpha'
        - resources:
            limits:
              cpu: '3'
              memory: 448Mi
            requests:
              cpu: 300m
              memory: 448Mi
          readinessProbe:
            httpGet:
              path: /ready
              port: 8089
              scheme: HTTP
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              exec:
                command:
                  - /opt/kserve/mmesh/stop.sh
                  - wait
          name: mm
          livenessProbe:
            httpGet:
              path: /live
              port: 8089
              scheme: HTTP
            initialDelaySeconds: 90
            timeoutSeconds: 5
            periodSeconds: 30
            successThreshold: 1
            failureThreshold: 2
          env:
            - name: MM_SERVICE_NAME
              value: modelmesh-serving
            - name: MM_SVC_GRPC_PORT
              value: '8033'
            - name: WKUBE_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: WKUBE_POD_IPADDR
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: MM_LOCATION
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
            - name: KV_STORE
              value: 'etcd:/opt/kserve/mmesh/etcd/etcd_connection'
            - name: MM_METRICS
              value: 'prometheus:port=2112;scheme=https'
            - name: SHUTDOWN_TIMEOUT_MS
              value: '90000'
            - name: INTERNAL_SERVING_GRPC_PORT
              value: '8001'
            - name: INTERNAL_GRPC_PORT
              value: '8085'
            - name: MM_SVC_GRPC_MAX_MSG_SIZE
              value: '16777216'
            - name: MM_KVSTORE_PREFIX
              value: mm
            - name: MM_DEFAULT_VMODEL_OWNER
              value: ksp
            - name: MM_LABELS
              value: 'mt:openvino_ir,mt:openvino_ir:opset1,pv:grpc-v1,rt:ovms-1.x'
            - name: MM_TYPE_CONSTRAINTS_PATH
              value: /etc/watson/mmesh/config/type_constraints
            - name: MM_DATAPLANE_CONFIG_PATH
              value: /etc/watson/mmesh/config/dataplane_api_config
            - name: MM_TLS_KEY_CERT_PATH
              value: /opt/kserve/mmesh/tls/tls.crt
            - name: MM_TLS_PRIVATE_KEY_PATH
              value: /opt/kserve/mmesh/tls/tls.key
          securityContext:
            capabilities:
              drop:
                - ALL
          ports:
            - name: grpc
              containerPort: 8033
              protocol: TCP
            - name: prometheus
              containerPort: 2112
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: tc-config
              mountPath: /etc/watson/mmesh/config
            - name: etcd-config
              readOnly: true
              mountPath: /opt/kserve/mmesh/etcd
            - name: tls-certs
              readOnly: true
              mountPath: /opt/kserve/mmesh/tls
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/modelmesh:v0.11.0-alpha'
      serviceAccount: modelmesh-serving-sa
      volumes:
        - name: proxy-tls
          secret:
            secretName: model-serving-proxy-tls
            defaultMode: 420
        - name: models-dir
          emptyDir:
            sizeLimit: 1536Mi
        - name: storage-config
          secret:
            secretName: storage-config
            defaultMode: 420
        - name: tc-config
          configMap:
            name: tc-config
            defaultMode: 420
        - name: etcd-config
          secret:
            secretName: model-serving-etcd
            defaultMode: 420
        - name: tls-certs
          secret:
            secretName: mm-new
            defaultMode: 420
      dnsPolicy: ClusterFirst
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 15%
      maxSurge: 75%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
status:
  observedGeneration: 6
  replicas: 2
  updatedReplicas: 2
  readyReplicas: 2
  availableReplicas: 2
  conditions:
    - type: Progressing
      status: 'True'
      lastUpdateTime: '2023-07-07T17:49:05Z'
      lastTransitionTime: '2023-07-07T16:17:16Z'
      reason: NewReplicaSetAvailable
      message: >-
        ReplicaSet "modelmesh-serving-ovms-1.x-6cdbbbbc79" has successfully
        progressed.
    - type: Available
      status: 'True'
      lastUpdateTime: '2023-07-10T15:41:34Z'
      lastTransitionTime: '2023-07-10T15:41:34Z'
      reason: MinimumReplicasAvailable
      message: Deployment has minimum availability.

All other logs are fine including the ones in modelmesh-controller.

Expected behavior

Rest-proxy container isn't showing failing logs.

Screenshots

Environment (please complete the following information):

OpenShift 4.13.0
Open Data Hub 1.7.0
Modelmesh version: v0.11.0-alpha (ref)
Controller namespace: opendatahub
User/isvc namespace: modelmesh-serving

Additional context

We tried deploying isvc in several namespaces and try the grpc inferencing once and it failed. We found that gRPC inferencing works only in one namespace. We have tried only once but I will give it a shot again. That's potentially next connected issue.

The text was updated successfully, but these errors were encountered:

heyselbi · 2023-07-11T19:14:20Z

@Jooho if you have more details to add, please feel free to do so.

ckadner · 2023-07-18T19:08:13Z

I experimented with the TLS setup on plain Kubernetes and OpenShift. Looks like we need to update our TLS doc to make it work.

Although I did not fully replicate your setup, I was able to get TLS setup without any errors showing in any of the rest-proxy containers across two namespaces after following our FVT setup for TLS. The instructions do the same with OpenSSL need to be updated however.

ckadner · 2024-01-20T03:46:00Z

Closing as stale. Please reopen if still an issue.

heyselbi added the bug Something isn't working label Jul 11, 2023

heyselbi mentioned this issue Jul 11, 2023

Enable gRPC inferencing by exposing route opendatahub-io/odh-model-controller#57

Open

ckadner self-assigned this Jul 11, 2023

ckadner mentioned this issue Jul 18, 2023

“Nowhere available to load” error in inference response kserve/modelmesh#69

Closed

ckadner closed this as completed Jan 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rest-proxy container failing when enabling gRPC inferencing #401

rest-proxy container failing when enabling gRPC inferencing #401

heyselbi commented Jul 11, 2023 •

edited

Loading

heyselbi commented Jul 11, 2023

ckadner commented Jul 18, 2023

ckadner commented Jan 20, 2024

rest-proxy container failing when enabling gRPC inferencing #401

rest-proxy container failing when enabling gRPC inferencing #401

Comments

heyselbi commented Jul 11, 2023 • edited Loading

heyselbi commented Jul 11, 2023

ckadner commented Jul 18, 2023

ckadner commented Jan 20, 2024

heyselbi commented Jul 11, 2023 •

edited

Loading