Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh: Add support to configure PrepackedTriton with no storage initialiser #4216

Merged
merged 4 commits into from
Jul 21, 2022

Conversation

brightsparc
Copy link
Contributor

@brightsparc brightsparc commented Jul 11, 2022

Added support to configure PrepackedTriton for no storage initialiser, instead passing start up args to load explicit models. 

What this PR does / why we need it:
When using the prepackaged Triton Inference server with a cloud based model repository, the storage initialiser will clone the full path to the registry which may contain a large number of models. To support multiple models with an object store Triton supports model mangement capabilities be explicit about which models to load. It also has native support to download these models from the object store without need to the storage initializer.

This PR adds the following:

  1. A new ANNOTATION_NO_STOARGE_INITIALIZER on the deploy spec to not create storage initializer, but instead pass down the ModelUri directly to Triton, along with any secrets configured.
  2. Support for passing Parameters set on the PredictiveUint to set as constant args to Triton Inference Server
  3. Tests to validate (1) and (2) as passed

Which issue(s) this PR fixes:
Fixes #4203

Special notes for your reviewer:
I ran pre-commit checks for all files and noticed some changes out of the scope of the files I touched, so left these alone.

@seldondev
Copy link
Collaborator

Hi @brightsparc. Thanks for your PR.

I'm waiting for a SeldonIO or todo member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@brightsparc
Copy link
Contributor Author

/assign @cliveseldon

@brightsparc
Copy link
Contributor Author

brightsparc commented Jul 12, 2022

I have confirmed this work in my local seldon deployment by applying this k8s spec:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-nostorage
  namespace: seldon
spec:
  name: triton-nostorage
  annotations:
    seldon.io/no-storage-initializer: "true"  
  predictors:
  - name: default
    annotations:
      seldon.io/no-engine: "true"  
    graph:
      implementation: TRITON_SERVER
      modelUri: s3://{bucket_name}/{bucket_prefix}
      parameters: # specify explicit control, and load two models
        - name: model_control_mode
          value: explicit
          type: STRING
        - name: load_model
          value: model1
          type: STRING
        - name: load_model
          value: model2
          type: STRING
      envSecretRefName: seldon-triton-secret
#       storageInitializerImage: seldonio/rclone-storage-initializer:1.15.0-dev            
      name: titanic
      type: MODEL
    replicas: 1
  protocol: v2

Also expects a secret (which I've included as part of unit tests)

apiVersion: v1
kind: Secret
metadata:
    name: seldon-triton-secret
type: Opaque
stringData:
    AWS_DEFAULT_REGION: "{region_name}"
    AWS_ACCESS_KEY_ID: "{aws_access_key_id}"
    AWS_SECRET_ACCESS_KEY: "{aws_secret_access_key}"

Verified the deployment contains a single pod, which loads the model artifacts on startup using the explicit command.

If this is useful context I could also add to docs

@brightsparc
Copy link
Contributor Author

/assign @cliveseldon

@ukclivecox
Copy link
Contributor

/test integration

@ukclivecox
Copy link
Contributor

/test notebooks

@brightsparc
Copy link
Contributor Author

The test error is in TestTimeout method. Looks like the error message has changed.

    client_test.go:341: 
        Expected
            <string>: Get "http://127.0.0.1:36681/health/status": context deadline exceeded
        to contain substring
            <string>: Client.Timeout exceeded while awaiting headers

@axsaucedo
Copy link
Contributor

Integration and notebook tests were successful

image

@brightsparc
Copy link
Contributor Author

Thanks @axsaucedo what are next steps?

@ukclivecox
Copy link
Contributor

It would be good to add docs for this feature. It could be done in a follow up PR. @brightsparc

@axsaucedo
Copy link
Contributor

Tested locally with success for both model initialiser and otherwise - nice one. Would be great if you could add the documentation with an example as quite a useful functionality
/approve

@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: axsaucedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@axsaucedo axsaucedo merged commit 6d22c9f into SeldonIO:master Jul 21, 2022
@brightsparc brightsparc deleted the enh-triton-initializer branch July 22, 2022 03:39
@saeid93
Copy link
Contributor

saeid93 commented Jul 22, 2022

@brightsparc this is very useful for me. I also want to load/unload models during the runtime of the Triton server. As you might have a similar use-case, do you know any workaround for that too? Triton server has endpoints for that but do you know a workaround to do that in the Triton+Seldon servers?

@brightsparc
Copy link
Contributor Author

@saeid93 once you have configured the explicit method you can also load and unload models as long as you use the no-engine annotation which bypasses the executor.

@saeid93
Copy link
Contributor

saeid93 commented Aug 7, 2022

@brightsparc @cliveseldon I'm trying to use this for explicit model control. I have a repo of models in my minio object storage and I expect that when I use the following yaml:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-nostorage
spec:
  name: triton-nostorage
  annotations:
    seldon.io/no-storage-initializer: "true"  
  predictors:
  - name: default
    annotations:
      seldon.io/no-engine: "true"  
    graph:
      implementation: TRITON_SERVER
      modelUri: s3://triton-server-new/triton-server-new
      parameters: # specify explicit control, and load two models
        - name: model_control_mode
          value: explicit
          type: STRING
        - name: load_model
          value: resnet
          type: STRING
      envSecretRefName: seldon-init-container-secret
#       storageInitializerImage: seldonio/rclone-storage-initializer:1.15.0-dev            
      name: titanic
      type: MODEL
    replicas: 1
  protocol: v2

it only load the resnet model of the object storage, however looking at triton logs it seems all the models are being loaded:

+-------------------------------------------------+---------+--------+
| Model                                           | Version | Status |
+-------------------------------------------------+---------+--------+
| beit                                            | 1       | READY  |
| beit                                            | 2       | READY  |
| distilbert-base-uncased-finetuned-sst-2-english | 1       | READY  |
| inception                                       | 1       | READY  |
| inception                                       | 2       | READY  |
| regnetx                                         | 1       | READY  |
| regnetx                                         | 2       | READY  |
| regnetx                                         | 3       | READY  |
| regnetx                                         | 4       | READY  |
| regnetx                                         | 5       | READY  |
| resnet                                          | 1       | READY  |
| resnet                                          | 2       | READY  |
| resnet                                          | 3       | READY  |
| resnet                                          | 4       | READY  |
| resnet                                          | 5       | READY  |
| resnet                                          | 6       | READY  |
| resnet                                          | 7       | READY  |
| resnet                                          | 8       | READY  |
| vgg                                             | 1       | READY  |
| vgg                                             | 2       | READY  |
| vgg                                             | 3       | READY  |
| vgg                                             | 4       | READY  |
| visformer                                       | 1       | READY  |
| xception                                        | 1       | READY  |
| xception                                        | 2       | READY  |
+-------------------------------------------------+---------+--------+

also using the following script for testing the models load/unload results in:

import tritonclient.http as httpclient
from tritonclient.utils import InferenceServerException


deployment_name = 'triton-nostorage'
namespace = "default"

URL = f"localhost:32000/seldon/{namespace}/{deployment_name}"
try:
    triton_client = httpclient.InferenceServerClient(
        url=URL, verbose=True,
    )
except Exception as e:
     print("context creation failed: " + str(e))
model_name = "resnet"
 
 
print(20*'-' + 'active models' + 20*'-' + '\n')
print(*triton_client.get_model_repository_index(), sep='\n')
 
print(20*'-' + f'unloading model: {model_name}' + 20*'-' + '\n')
print(triton_client.unload_model(model_name))
 
print(20*'-' + 'active models after unloading' + 20*'-' + '\n')
print(*triton_client.get_model_repository_index(), sep='\n')
 
print(20*'-' + f'load model: {model_name}' + 20*'-' + '\n')
print(triton_client.load_model(model_name))
 
print(20*'-' + 'active models after loading back' + 20*'-' + '\n')
print(*triton_client.get_model_repository_index(), sep='\n')

results in:

--------------------active models--------------------

{'name': 'beit', 'version': '1', 'state': 'READY'}
{'name': 'beit', 'version': '2', 'state': 'READY'}
{'name': 'distilbert-base-uncased-finetuned-sst-2-english', 'version': '1', 'state': 'READY'}
{'name': 'inception', 'version': '1', 'state': 'READY'}
{'name': 'inception', 'version': '2', 'state': 'READY'}
{'name': 'regnetx', 'version': '1', 'state': 'READY'}
{'name': 'regnetx', 'version': '2', 'state': 'READY'}
{'name': 'regnetx', 'version': '3', 'state': 'READY'}
{'name': 'regnetx', 'version': '4', 'state': 'READY'}
{'name': 'regnetx', 'version': '5', 'state': 'READY'}
{'name': 'resnet', 'version': '1', 'state': 'READY'}
{'name': 'resnet', 'version': '2', 'state': 'READY'}
{'name': 'resnet', 'version': '3', 'state': 'READY'}
{'name': 'resnet', 'version': '4', 'state': 'READY'}
{'name': 'resnet', 'version': '5', 'state': 'READY'}
{'name': 'resnet', 'version': '6', 'state': 'READY'}
{'name': 'resnet', 'version': '7', 'state': 'READY'}
{'name': 'resnet', 'version': '8', 'state': 'READY'}
{'name': 'vgg', 'version': '1', 'state': 'READY'}
{'name': 'vgg', 'version': '2', 'state': 'READY'}
{'name': 'vgg', 'version': '3', 'state': 'READY'}
{'name': 'vgg', 'version': '4', 'state': 'READY'}
{'name': 'visformer', 'version': '1', 'state': 'READY'}
{'name': 'xception', 'version': '1', 'state': 'READY'}
{'name': 'xception', 'version': '2', 'state': 'READY'}
--------------------unloading model: resnet--------------------

Traceback (most recent call last):
  File "/home/cc/infernece-pipeline-joint-optimization/pipelines/19-outside-poc/triton-inferline/triton-client-offload.py", line 164, in <module>
    print(triton_client.unload_model(model_name))
  File "/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/tritonclient/http/__init__.py", line 721, in unload_model
    _raise_if_error(response)
  File "/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/tritonclient/http/__init__.py", line 65, in _raise_if_error
    raise error
tritonclient.utils.InferenceServerException: explicit model load / unload is not allowed if polling is enabled

It seems that the server is not at explicit mode and for some reason, this pull request is not working for me.

@saeid93
Copy link
Contributor

saeid93 commented Aug 7, 2022

Also using only the second feature of this commit (passing parameters to predictor units) seems not working for me.

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-nostorage
spec:
  name: triton-nostorage
  predictors:
  - name: default
    annotations:
      seldon.io/no-engine: "true"  
    graph:
      implementation: TRITON_SERVER
      modelUri: s3://triton-server-new/triton-server-new
      parameters: # specify explicit control, and load two models
        - name: model_control_mode
          value: explicit
          type: STRING
      envSecretRefName: seldon-init-container-secret
#       storageInitializerImage: seldonio/rclone-storage-initializer:1.15.0-dev            
      name: titanic
      type: MODEL
    replicas: 1
  protocol: v2

The server is not at explicit mode like above and load/unload is disabled.

@brightsparc
Copy link
Contributor Author

brightsparc commented Aug 7, 2022

Hi @saeid93 in order to use the explicit loading you need to specify the annotation no-storage-initializer: true see below:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-nostorage
  namespace: seldon
spec:
  name: triton-nostorage
  annotations:
    seldon.io/no-storage-initializer: "true"  

You should then be able to see the output in the log from the triton container that indicates this is set:

| model_control_mode               | MODE_EXPLICIT  

Note you also have the alternative to creating your own container specification to specify explicit arguments for more control:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: triton-autoscaling
  namespace: seldon
spec:
  name: triton-autoscaling
  predictors:
  - name: default
    annotations:
      seldon.io/no-engine: "true"  
      # prometheus.io/path: "/metrics" # Annotate this 
    componentSpecs:
    - spec:
        containers:
        - image: nvcr.io/nvidia/tritonserver:{triton_version}
          ports:
            - containerPort: 9000
              protocol: TCP
              name: http
            - containerPort: 9500
              protocol: TCP
              name: grpc
            - containerPort: 8002
              protocol: TCP
              name: nv-metrics # create nvidia-metrics port name
          livenessProbe:
            httpGet:
              path: /v2/health/live
              port: http
          readinessProbe:
            initialDelaySeconds: 5
            periodSeconds: 5
            httpGet:
              path: /v2/health/ready
              port: http                
          args: 
            - "/opt/tritonserver/bin/tritonserver"
            - "--http-port=9000" # override default http port
            - "--grpc-port=9500" # override default grpc port
            - "--metrics-port=8002" # must be different to engine port
            - "--model-repository={triton_repository}"
            - "--model-control-mode=explicit"
            - "--load-model=titanic_ensemble"
          envFrom:
            - secretRef: 
                name: seldon-triton-secret
          imagePullPolicy: IfNotPresent
          name: triton-local
          resources:
            requests:
              cpu: '0.5' 
        terminationGracePeriodSeconds: 1
    replicas: 1 
    graph:
      name: triton-local 
      type: MODEL
  protocol: v2

Cheers,
Julian.

@saeid93
Copy link
Contributor

saeid93 commented Aug 12, 2022

Hi @brightsparc , thank you for your answer.
I'm trying the following yaml file and I have my models stored in minio:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: resnet
spec:
  name: resnet
  annotations:
    seldon.io/no-storage-initializer: "true"
  predictors:
  - graph:
      implementation: TRITON_SERVER
      logger:
        mode: all
      modelUri: s3://http://minio.minio-system.svc.cluster.local:9000/triton-server-all/triton-server-all
      parameters: # specify explicit control, and load two models
      envSecretRefName: seldon-triton-secret
      name: resnet
      type: MODEL
    annotations:
      seldon.io/no-engine: "true"
    name: default
    replicas: 1
  protocol: v2

and this is my minio secret:

apiVersion: v1
kind: Secret
metadata:
    name: seldon-triton-secret
type: Opaque
stringData:
    AWS_ACCESS_KEY_ID: "minioadmin"
    AWS_SECRET_ACCESS_KEY: "minioadmin"

However, it seems that the storage initializer is still firing up:

k describe pod resnet-default-0-resnet-5895fcd8f9-xjfnb 

output:

...
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  25s               default-scheduler  Successfully assigned default/resnet-default-0-resnet-5895fcd8f9-xjfnb to k8s-cluster
  Normal   Pulled     6s (x3 over 24s)  kubelet            Container image "seldonio/rclone-storage-initializer:1.14.0" already present on machine
  Normal   Created    6s (x3 over 24s)  kubelet            Created container resnet-model-initializer
  Normal   Started    6s (x3 over 24s)  kubelet            Started container resnet-model-initializer
  Warning  BackOff    6s (x3 over 22s)  kubelet            Back-off restarting failed container

It seems that the model-initializer is still being used because if I change the secret to the rclone secret:

apiVersion: v1
kind: Secret
metadata:
  name: seldon-init-container-secret
type: Opaque
stringData:
  RCLONE_CONFIG_S3_TYPE: s3
  RCLONE_CONFIG_S3_PROVIDER: minio
  RCLONE_CONFIG_S3_ENV_AUTH: "false"
  RCLONE_CONFIG_S3_ACCESS_KEY_ID: minioadmin
  RCLONE_CONFIG_S3_SECRET_ACCESS_KEY: minioadmin
  RCLONE_CONFIG_S3_ENDPOINT: http://minio.minio-system.svc.cluster.local:9000

The following config will work and it is still using the rclone initializer which we expect to be disabled:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: resnet
spec:
  name: default
  annotations:
    seldon.io/no-storage-initializer: "true"
  predictors:
  - graph:
      implementation: TRITON_SERVER
      logger:
        mode: all
      modelUri: s3://triton-server-all/triton-server-all
      parameters: # specify explicit control, and load two models
        - name: model_control_mode
          value: explicit
          type: STRING
      envSecretRefName: seldon-init-container-secret
      name: resnet
      type: MODEL
    annotations:
      seldon.io/no-engine: "true"
    name: default
    replicas: 1
  protocol: kfserving

and also the the model control mode does not change:

| model_repository_path[0]         | /mnt/models                                                                                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 0                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456  

It seems I'm doing something wrong as the initializer is still being up. Could you please confirm that the yamls are in the expected format?

@cliveseldon would it be possible since this issue is not yet included in the latest version? which version of the repo will be installed when we use the helm chart installation? does helm always install the last commit on the main branch?

Many thanks,
Saeid

@brightsparc
Copy link
Contributor Author

I @saeid93, to not have the model initializer load you still need to add the seldon.io/no-engine: "true" annotation.

Also for minio registry you provided:

s3://http://minio.minio-system.svc.cluster.local:9000/triton-server-all/triton-server-all

This does not need to include the http, and instead should be:

s3://minio.minio-system.svc.cluster.local:9000/triton-server-all/triton-server-all

@saeid93
Copy link
Contributor

saeid93 commented Aug 13, 2022

@brightsparc Thank you for your answer. In the provided yaml I had already mentioned the triton-no-engine annotation:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: resnet
spec:
  name: resnet
  annotations:
    seldon.io/no-storage-initializer: "true"
  predictors:
  - graph:
      implementation: TRITON_SERVER
      logger:
        mode: all
      modelUri: s3://minio.minio-system.svc.cluster.local:9000/triton-server-all/triton-server-all
      parameters:
      envSecretRefName:  seldon-triton-secret
      name: resnet
      type: MODEL
    annotations:
      seldon.io/no-engine: "true" # -- No engine annotation --
    name: default
    replicas: 1
  protocol: v2

changing the http to s3 also resulted in the same problem. Is there anything else that could be causing the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enh: Allow Triton Server to download models on demand, disable storage initialiser
5 participants