enh: Add support to configure PrepackedTriton with no storage initialiser #4216

brightsparc · 2022-07-11T12:05:04Z

Added support to configure PrepackedTriton for no storage initialiser, instead passing start up args to load explicit models.

What this PR does / why we need it:
When using the prepackaged Triton Inference server with a cloud based model repository, the storage initialiser will clone the full path to the registry which may contain a large number of models. To support multiple models with an object store Triton supports model mangement capabilities be explicit about which models to load. It also has native support to download these models from the object store without need to the storage initializer.

This PR adds the following:

A new ANNOTATION_NO_STOARGE_INITIALIZER on the deploy spec to not create storage initializer, but instead pass down the ModelUri directly to Triton, along with any secrets configured.
Support for passing Parameters set on the PredictiveUint to set as constant args to Triton Inference Server
Tests to validate (1) and (2) as passed

Which issue(s) this PR fixes:
Fixes #4203

Special notes for your reviewer:
I ran pre-commit checks for all files and noticed some changes out of the scope of the files I touched, so left these alone.

…, and pass down arguments to load models SeldonIO#4203

seldondev · 2022-07-11T12:05:14Z

Hi @brightsparc. Thanks for your PR.

I'm waiting for a SeldonIO or todo member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

brightsparc · 2022-07-11T12:06:34Z

/assign @cliveseldon

…lidate envFrom secret set

brightsparc · 2022-07-12T05:20:24Z

I have confirmed this work in my local seldon deployment by applying this k8s spec:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-nostorage
  namespace: seldon
spec:
  name: triton-nostorage
  annotations:
    seldon.io/no-storage-initializer: "true"  
  predictors:
  - name: default
    annotations:
      seldon.io/no-engine: "true"  
    graph:
      implementation: TRITON_SERVER
      modelUri: s3://{bucket_name}/{bucket_prefix}
      parameters: # specify explicit control, and load two models
        - name: model_control_mode
          value: explicit
          type: STRING
        - name: load_model
          value: model1
          type: STRING
        - name: load_model
          value: model2
          type: STRING
      envSecretRefName: seldon-triton-secret
#       storageInitializerImage: seldonio/rclone-storage-initializer:1.15.0-dev            
      name: titanic
      type: MODEL
    replicas: 1
  protocol: v2

Also expects a secret (which I've included as part of unit tests)

apiVersion: v1
kind: Secret
metadata:
    name: seldon-triton-secret
type: Opaque
stringData:
    AWS_DEFAULT_REGION: "{region_name}"
    AWS_ACCESS_KEY_ID: "{aws_access_key_id}"
    AWS_SECRET_ACCESS_KEY: "{aws_secret_access_key}"

Verified the deployment contains a single pod, which loads the model artifacts on startup using the explicit command.

If this is useful context I could also add to docs

brightsparc · 2022-07-12T05:20:37Z

/assign @cliveseldon

ukclivecox · 2022-07-12T07:52:40Z

/test integration

ukclivecox · 2022-07-12T07:52:48Z

/test notebooks

brightsparc · 2022-07-12T12:10:42Z

The test error is in TestTimeout method. Looks like the error message has changed.

    client_test.go:341: 
        Expected
            <string>: Get "http://127.0.0.1:36681/health/status": context deadline exceeded
        to contain substring
            <string>: Client.Timeout exceeded while awaiting headers

axsaucedo · 2022-07-13T11:31:01Z

Integration and notebook tests were successful

brightsparc · 2022-07-13T20:46:07Z

Thanks @axsaucedo what are next steps?

ukclivecox · 2022-07-14T06:23:02Z

It would be good to add docs for this feature. It could be done in a follow up PR. @brightsparc

axsaucedo · 2022-07-21T16:18:59Z

Tested locally with success for both model initialiser and otherwise - nice one. Would be great if you could add the documentation with an example as quite a useful functionality
/approve

seldondev · 2022-07-21T16:19:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: axsaucedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [axsaucedo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

saeid93 · 2022-07-22T13:40:42Z

@brightsparc this is very useful for me. I also want to load/unload models during the runtime of the Triton server. As you might have a similar use-case, do you know any workaround for that too? Triton server has endpoints for that but do you know a workaround to do that in the Triton+Seldon servers?

brightsparc · 2022-07-22T20:38:44Z

@saeid93 once you have configured the explicit method you can also load and unload models as long as you use the no-engine annotation which bypasses the executor.

saeid93 · 2022-08-07T11:18:46Z

@brightsparc @cliveseldon I'm trying to use this for explicit model control. I have a repo of models in my minio object storage and I expect that when I use the following yaml:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-nostorage
spec:
  name: triton-nostorage
  annotations:
    seldon.io/no-storage-initializer: "true"  
  predictors:
  - name: default
    annotations:
      seldon.io/no-engine: "true"  
    graph:
      implementation: TRITON_SERVER
      modelUri: s3://triton-server-new/triton-server-new
      parameters: # specify explicit control, and load two models
        - name: model_control_mode
          value: explicit
          type: STRING
        - name: load_model
          value: resnet
          type: STRING
      envSecretRefName: seldon-init-container-secret
#       storageInitializerImage: seldonio/rclone-storage-initializer:1.15.0-dev            
      name: titanic
      type: MODEL
    replicas: 1
  protocol: v2

it only load the resnet model of the object storage, however looking at triton logs it seems all the models are being loaded:

+-------------------------------------------------+---------+--------+
| Model                                           | Version | Status |
+-------------------------------------------------+---------+--------+
| beit                                            | 1       | READY  |
| beit                                            | 2       | READY  |
| distilbert-base-uncased-finetuned-sst-2-english | 1       | READY  |
| inception                                       | 1       | READY  |
| inception                                       | 2       | READY  |
| regnetx                                         | 1       | READY  |
| regnetx                                         | 2       | READY  |
| regnetx                                         | 3       | READY  |
| regnetx                                         | 4       | READY  |
| regnetx                                         | 5       | READY  |
| resnet                                          | 1       | READY  |
| resnet                                          | 2       | READY  |
| resnet                                          | 3       | READY  |
| resnet                                          | 4       | READY  |
| resnet                                          | 5       | READY  |
| resnet                                          | 6       | READY  |
| resnet                                          | 7       | READY  |
| resnet                                          | 8       | READY  |
| vgg                                             | 1       | READY  |
| vgg                                             | 2       | READY  |
| vgg                                             | 3       | READY  |
| vgg                                             | 4       | READY  |
| visformer                                       | 1       | READY  |
| xception                                        | 1       | READY  |
| xception                                        | 2       | READY  |
+-------------------------------------------------+---------+--------+

also using the following script for testing the models load/unload results in:

import tritonclient.http as httpclient
from tritonclient.utils import InferenceServerException


deployment_name = 'triton-nostorage'
namespace = "default"

URL = f"localhost:32000/seldon/{namespace}/{deployment_name}"
try:
    triton_client = httpclient.InferenceServerClient(
        url=URL, verbose=True,
    )
except Exception as e:
     print("context creation failed: " + str(e))
model_name = "resnet"
 
 
print(20*'-' + 'active models' + 20*'-' + '\n')
print(*triton_client.get_model_repository_index(), sep='\n')
 
print(20*'-' + f'unloading model: {model_name}' + 20*'-' + '\n')
print(triton_client.unload_model(model_name))
 
print(20*'-' + 'active models after unloading' + 20*'-' + '\n')
print(*triton_client.get_model_repository_index(), sep='\n')
 
print(20*'-' + f'load model: {model_name}' + 20*'-' + '\n')
print(triton_client.load_model(model_name))
 
print(20*'-' + 'active models after loading back' + 20*'-' + '\n')
print(*triton_client.get_model_repository_index(), sep='\n')

results in:

--------------------active models--------------------

{'name': 'beit', 'version': '1', 'state': 'READY'}
{'name': 'beit', 'version': '2', 'state': 'READY'}
{'name': 'distilbert-base-uncased-finetuned-sst-2-english', 'version': '1', 'state': 'READY'}
{'name': 'inception', 'version': '1', 'state': 'READY'}
{'name': 'inception', 'version': '2', 'state': 'READY'}
{'name': 'regnetx', 'version': '1', 'state': 'READY'}
{'name': 'regnetx', 'version': '2', 'state': 'READY'}
{'name': 'regnetx', 'version': '3', 'state': 'READY'}
{'name': 'regnetx', 'version': '4', 'state': 'READY'}
{'name': 'regnetx', 'version': '5', 'state': 'READY'}
{'name': 'resnet', 'version': '1', 'state': 'READY'}
{'name': 'resnet', 'version': '2', 'state': 'READY'}
{'name': 'resnet', 'version': '3', 'state': 'READY'}
{'name': 'resnet', 'version': '4', 'state': 'READY'}
{'name': 'resnet', 'version': '5', 'state': 'READY'}
{'name': 'resnet', 'version': '6', 'state': 'READY'}
{'name': 'resnet', 'version': '7', 'state': 'READY'}
{'name': 'resnet', 'version': '8', 'state': 'READY'}
{'name': 'vgg', 'version': '1', 'state': 'READY'}
{'name': 'vgg', 'version': '2', 'state': 'READY'}
{'name': 'vgg', 'version': '3', 'state': 'READY'}
{'name': 'vgg', 'version': '4', 'state': 'READY'}
{'name': 'visformer', 'version': '1', 'state': 'READY'}
{'name': 'xception', 'version': '1', 'state': 'READY'}
{'name': 'xception', 'version': '2', 'state': 'READY'}
--------------------unloading model: resnet--------------------

Traceback (most recent call last):
  File "/home/cc/infernece-pipeline-joint-optimization/pipelines/19-outside-poc/triton-inferline/triton-client-offload.py", line 164, in <module>
    print(triton_client.unload_model(model_name))
  File "/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/tritonclient/http/__init__.py", line 721, in unload_model
    _raise_if_error(response)
  File "/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/tritonclient/http/__init__.py", line 65, in _raise_if_error
    raise error
tritonclient.utils.InferenceServerException: explicit model load / unload is not allowed if polling is enabled

It seems that the server is not at explicit mode and for some reason, this pull request is not working for me.

saeid93 · 2022-08-07T11:34:13Z

Also using only the second feature of this commit (passing parameters to predictor units) seems not working for me.

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-nostorage
spec:
  name: triton-nostorage
  predictors:
  - name: default
    annotations:
      seldon.io/no-engine: "true"  
    graph:
      implementation: TRITON_SERVER
      modelUri: s3://triton-server-new/triton-server-new
      parameters: # specify explicit control, and load two models
        - name: model_control_mode
          value: explicit
          type: STRING
      envSecretRefName: seldon-init-container-secret
#       storageInitializerImage: seldonio/rclone-storage-initializer:1.15.0-dev            
      name: titanic
      type: MODEL
    replicas: 1
  protocol: v2

The server is not at explicit mode like above and load/unload is disabled.

brightsparc · 2022-08-07T23:43:08Z

Hi @saeid93 in order to use the explicit loading you need to specify the annotation no-storage-initializer: true see below:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-nostorage
  namespace: seldon
spec:
  name: triton-nostorage
  annotations:
    seldon.io/no-storage-initializer: "true"

You should then be able to see the output in the log from the triton container that indicates this is set:

| model_control_mode               | MODE_EXPLICIT

Note you also have the alternative to creating your own container specification to specify explicit arguments for more control:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-autoscaling
  namespace: seldon
spec:
  name: triton-autoscaling
  predictors:
  - name: default
    annotations:
      seldon.io/no-engine: "true"  
      # prometheus.io/path: "/metrics" # Annotate this 
    componentSpecs:
    - spec:
        containers:
        - image: nvcr.io/nvidia/tritonserver:{triton_version}
          ports:
            - containerPort: 9000
              protocol: TCP
              name: http
            - containerPort: 9500
              protocol: TCP
              name: grpc
            - containerPort: 8002
              protocol: TCP
              name: nv-metrics # create nvidia-metrics port name
          livenessProbe:
            httpGet:
              path: /v2/health/live
              port: http
          readinessProbe:
            initialDelaySeconds: 5
            periodSeconds: 5
            httpGet:
              path: /v2/health/ready
              port: http                
          args: 
            - "/opt/tritonserver/bin/tritonserver"
            - "--http-port=9000" # override default http port
            - "--grpc-port=9500" # override default grpc port
            - "--metrics-port=8002" # must be different to engine port
            - "--model-repository={triton_repository}"
            - "--model-control-mode=explicit"
            - "--load-model=titanic_ensemble"
          envFrom:
            - secretRef: 
                name: seldon-triton-secret
          imagePullPolicy: IfNotPresent
          name: triton-local
          resources:
            requests:
              cpu: '0.5' 
        terminationGracePeriodSeconds: 1
    replicas: 1 
    graph:
      name: triton-local 
      type: MODEL
  protocol: v2

Cheers,
Julian.

saeid93 · 2022-08-12T21:11:42Z

Hi @brightsparc , thank you for your answer.
I'm trying the following yaml file and I have my models stored in minio:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: resnet
spec:
  name: resnet
  annotations:
    seldon.io/no-storage-initializer: "true"
  predictors:
  - graph:
      implementation: TRITON_SERVER
      logger:
        mode: all
      modelUri: s3://http://minio.minio-system.svc.cluster.local:9000/triton-server-all/triton-server-all
      parameters: # specify explicit control, and load two models
      envSecretRefName: seldon-triton-secret
      name: resnet
      type: MODEL
    annotations:
      seldon.io/no-engine: "true"
    name: default
    replicas: 1
  protocol: v2

and this is my minio secret:

apiVersion: v1
kind: Secret
metadata:
    name: seldon-triton-secret
type: Opaque
stringData:
    AWS_ACCESS_KEY_ID: "minioadmin"
    AWS_SECRET_ACCESS_KEY: "minioadmin"

However, it seems that the storage initializer is still firing up:

k describe pod resnet-default-0-resnet-5895fcd8f9-xjfnb

output:

...
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  25s               default-scheduler  Successfully assigned default/resnet-default-0-resnet-5895fcd8f9-xjfnb to k8s-cluster
  Normal   Pulled     6s (x3 over 24s)  kubelet            Container image "seldonio/rclone-storage-initializer:1.14.0" already present on machine
  Normal   Created    6s (x3 over 24s)  kubelet            Created container resnet-model-initializer
  Normal   Started    6s (x3 over 24s)  kubelet            Started container resnet-model-initializer
  Warning  BackOff    6s (x3 over 22s)  kubelet            Back-off restarting failed container

It seems that the model-initializer is still being used because if I change the secret to the rclone secret:

apiVersion: v1
kind: Secret
metadata:
  name: seldon-init-container-secret
type: Opaque
stringData:
  RCLONE_CONFIG_S3_TYPE: s3
  RCLONE_CONFIG_S3_PROVIDER: minio
  RCLONE_CONFIG_S3_ENV_AUTH: "false"
  RCLONE_CONFIG_S3_ACCESS_KEY_ID: minioadmin
  RCLONE_CONFIG_S3_SECRET_ACCESS_KEY: minioadmin
  RCLONE_CONFIG_S3_ENDPOINT: http://minio.minio-system.svc.cluster.local:9000

The following config will work and it is still using the rclone initializer which we expect to be disabled:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: resnet
spec:
  name: default
  annotations:
    seldon.io/no-storage-initializer: "true"
  predictors:
  - graph:
      implementation: TRITON_SERVER
      logger:
        mode: all
      modelUri: s3://triton-server-all/triton-server-all
      parameters: # specify explicit control, and load two models
        - name: model_control_mode
          value: explicit
          type: STRING
      envSecretRefName: seldon-init-container-secret
      name: resnet
      type: MODEL
    annotations:
      seldon.io/no-engine: "true"
    name: default
    replicas: 1
  protocol: kfserving

and also the the model control mode does not change:

| model_repository_path[0]         | /mnt/models                                                                                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 0                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456

It seems I'm doing something wrong as the initializer is still being up. Could you please confirm that the yamls are in the expected format?

@cliveseldon would it be possible since this issue is not yet included in the latest version? which version of the repo will be installed when we use the helm chart installation? does helm always install the last commit on the main branch?

Many thanks,
Saeid

brightsparc · 2022-08-12T23:27:51Z

I @saeid93, to not have the model initializer load you still need to add the seldon.io/no-engine: "true" annotation.

Also for minio registry you provided:

s3://http://minio.minio-system.svc.cluster.local:9000/triton-server-all/triton-server-all

This does not need to include the http, and instead should be:

s3://minio.minio-system.svc.cluster.local:9000/triton-server-all/triton-server-all

saeid93 · 2022-08-13T13:14:23Z

@brightsparc Thank you for your answer. In the provided yaml I had already mentioned the triton-no-engine annotation:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: resnet
spec:
  name: resnet
  annotations:
    seldon.io/no-storage-initializer: "true"
  predictors:
  - graph:
      implementation: TRITON_SERVER
      logger:
        mode: all
      modelUri: s3://minio.minio-system.svc.cluster.local:9000/triton-server-all/triton-server-all
      parameters:
      envSecretRefName:  seldon-triton-secret
      name: resnet
      type: MODEL
    annotations:
      seldon.io/no-engine: "true" # -- No engine annotation --
    name: default
    replicas: 1
  protocol: v2

changing the http to s3 also resulted in the same problem. Is there anything else that could be causing the problem?

brightsparc added 2 commits July 11, 2022 19:18

Adding support to configure PrepackTriton with no storage initializer…

6a9ab3c

…, and pass down arguments to load models SeldonIO#4203

Update to compare args with suffix instead of index

e846e80

seldondev added needs-ok-to-test size/L labels Jul 11, 2022

seldondev assigned ukclivecox Jul 11, 2022

Update to set secret before spec copied. Added tests to create and va…

d95a8d4

…lidate envFrom secret set

Merge branch 'SeldonIO:master' into enh-triton-initializer

cf71d9e

seldondev added the approved label Jul 21, 2022

axsaucedo merged commit 6d22c9f into SeldonIO:master Jul 21, 2022

brightsparc deleted the enh-triton-initializer branch July 22, 2022 03:39

saeid93 mentioned this pull request Jul 30, 2022

implementation of online model load/unload of Triton #4241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enh: Add support to configure PrepackedTriton with no storage initialiser #4216

enh: Add support to configure PrepackedTriton with no storage initialiser #4216

brightsparc commented Jul 11, 2022 •

edited

Loading

seldondev commented Jul 11, 2022

brightsparc commented Jul 11, 2022

brightsparc commented Jul 12, 2022 •

edited

Loading

brightsparc commented Jul 12, 2022

ukclivecox commented Jul 12, 2022

ukclivecox commented Jul 12, 2022

brightsparc commented Jul 12, 2022

axsaucedo commented Jul 13, 2022

brightsparc commented Jul 13, 2022

ukclivecox commented Jul 14, 2022

axsaucedo commented Jul 21, 2022

seldondev commented Jul 21, 2022

saeid93 commented Jul 22, 2022

brightsparc commented Jul 22, 2022

saeid93 commented Aug 7, 2022

saeid93 commented Aug 7, 2022

brightsparc commented Aug 7, 2022 •

edited

Loading

saeid93 commented Aug 12, 2022

brightsparc commented Aug 12, 2022

saeid93 commented Aug 13, 2022

enh: Add support to configure PrepackedTriton with no storage initialiser #4216

enh: Add support to configure PrepackedTriton with no storage initialiser #4216

Conversation

brightsparc commented Jul 11, 2022 • edited Loading

seldondev commented Jul 11, 2022

brightsparc commented Jul 11, 2022

brightsparc commented Jul 12, 2022 • edited Loading

brightsparc commented Jul 12, 2022

ukclivecox commented Jul 12, 2022

ukclivecox commented Jul 12, 2022

brightsparc commented Jul 12, 2022

axsaucedo commented Jul 13, 2022

brightsparc commented Jul 13, 2022

ukclivecox commented Jul 14, 2022

axsaucedo commented Jul 21, 2022

seldondev commented Jul 21, 2022

saeid93 commented Jul 22, 2022

brightsparc commented Jul 22, 2022

saeid93 commented Aug 7, 2022

saeid93 commented Aug 7, 2022

brightsparc commented Aug 7, 2022 • edited Loading

saeid93 commented Aug 12, 2022

brightsparc commented Aug 12, 2022

saeid93 commented Aug 13, 2022

brightsparc commented Jul 11, 2022 •

edited

Loading

brightsparc commented Jul 12, 2022 •

edited

Loading

brightsparc commented Aug 7, 2022 •

edited

Loading