Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable performing inferences for gRPC (grpcurl) #183

Merged
merged 41 commits into from
Dec 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
6231615
grpc-call and http-call replaced by inference-call which takes a mand…
Dec 7, 2023
3cf84bc
In custom-manifests/caikit caikit-tgis-servingruntime-grpc.yaml and …
Dec 7, 2023
26a8a90
In custom-manifests/caikit caikit-tgis-servingruntime.yaml replaced …
Dec 7, 2023
4f6559a
One mandatary arg the protocol, either http or grpc. The isvc and the…
Dec 7, 2023
5b47666
Fix the step-by-step docummentation for deploying and removing an LLM…
ymoatti Dec 7, 2023
ee7d8f7
Modified so that both kserve-demo-http and kserver-demo-grpc may be r…
ymoatti Dec 7, 2023
962751f
Modified to handle both kserve-demo-http and kserver-demo-grpc
ymoatti Dec 7, 2023
150d45e
Modified to handle both kserve-demo-http and kserver-demo-grpc
ymoatti Dec 7, 2023
fafd57b
created yaml files for caikit-tgis-isvcs are put in ./custom-manifest…
ymoatti Dec 7, 2023
24e7e10
fix the documentation for scripted deployment/removal of sample models
ymoatti Dec 7, 2023
840ea9e
3 bugs fixed
ymoatti Dec 7, 2023
a625554
Add comment
ymoatti Dec 10, 2023
c8197eb
Fix comment
ymoatti Dec 10, 2023
04af1a3
fix bug in delete-model.sh minio ns should not be deleted!
ymoatti Dec 10, 2023
c172968
fix comment in delete-model.sh!
ymoatti Dec 10, 2023
b9e249d
delete-model is now for a specific protocol
ymoatti Dec 11, 2023
428e1af
inference-call.sh replaced by grpc-call.sh and http-call.sh
ymoatti Dec 12, 2023
926b4a9
delete-model.sh reverted to original version - no change for PR 183
ymoatti Dec 12, 2023
07f3dbd
Move to single namespace: kserve-demo, HTTP becomes default
ymoatti Dec 12, 2023
5c56f1b
We have now specific http and grpc yaml files for flan5 LLM and in ad…
ymoatti Dec 12, 2023
75264f7
HTTP is default, thus caikit-tgis-servingruntime-http.yaml is rename…
ymoatti Dec 12, 2023
d50f0ba
remove PREFIX and INF_PROTO variables
ymoatti Dec 13, 2023
448920d
fix bug in comment
ymoatti Dec 13, 2023
4afbef0
consistent use of brackets around env variables
ymoatti Dec 13, 2023
6dcc663
add set -u to prevent usage of any non initialized env variable
ymoatti Dec 19, 2023
7fc47e3
modify manifests so as to directly spin the http/grpc servers
ymoatti Dec 19, 2023
1d94e5f
improve the ISVC template by removing specific sa name and adding com…
ymoatti Dec 20, 2023
8048b15
comment improvement
ymoatti Dec 20, 2023
4d6ba0a
yet another comment improve
ymoatti Dec 20, 2023
e3fc1e7
Improve documentation in deploy-remove.md for step 2.e as well as com…
ymoatti Dec 20, 2023
edebf6c
Further improve documentation in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
1e15469
Further improve documentation in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
aa0e672
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
2ea48f9
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
24b264d
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
2f70458
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
df575a7
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
a9a47cc
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
d37977a
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
47a4a64
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
7f33b24
Yet another documentation improvement in deploy-remove.md for step 2.e
ymoatti Dec 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions demo/kserve/custom-manifests/caikit/caikit-tgis-isvc-grpc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,16 @@ metadata:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
name: caikit-tgis-example-isvc
name: caikit-tgis-isvc-grpc
spec:
predictor:
serviceAccountName: sa
model:
modelFormat:
name: caikit
runtime: caikit-tgis-runtime
ports:
- containerPort: 8085
name: h2c
protocol: TCP
storageUri: proto://path/to/model # single model here
runtime: caikit-tgis-runtime-grpc
storageUri: s3://modelmesh-example-models/llm/models/flan-t5-small-caikit # single model here
# storageUri: proto://path/to/model # single model here
# Example, using a pvc:
# storageUri: pvc://caikit-pvc/flan-t5-small-caikit/
# Target directory must contain a config.yml
24 changes: 24 additions & 0 deletions demo/kserve/custom-manifests/caikit/caikit-tgis-isvc-template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
annotations:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
# The following <caikit-tgis-isvc-name> should be set to the
# actual name of the inference service. (e.g., caikit-tgis-isvc
# for HTTP and caikit-tgis-isvc-grpc for gRPC)
name: <caikit-tgis-isvc-name>
spec:
predictor:
# replace in following <NameOfAServiceAccount> with the name
# of a ServiceAccount that has the secret for accessing the model
serviceAccountName: <NameOfAServiceAccount>
model:
modelFormat:
name: caikit
runtime: caikit-tgis-runtime
storageUri: proto://path/to/model # single model here
# Example, using a pvc:
# storageUri: pvc://caikit-pvc/flan-t5-small-caikit/
# Target directory must contain a config.yml
6 changes: 4 additions & 2 deletions demo/kserve/custom-manifests/caikit/caikit-tgis-isvc.yaml
Xaenalt marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,16 @@ metadata:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
name: caikit-tgis-example-isvc
name: caikit-tgis-isvc
spec:
predictor:
serviceAccountName: sa
Xaenalt marked this conversation as resolved.
Show resolved Hide resolved
model:
modelFormat:
name: caikit
runtime: caikit-tgis-runtime
storageUri: proto://path/to/model # single model here
storageUri: s3://modelmesh-example-models/llm/models/flan-t5-small-caikit # single model here
# storageUri: proto://path/to/model # single model here
Xaenalt marked this conversation as resolved.
Show resolved Hide resolved
# Example, using a pvc:
# storageUri: pvc://caikit-pvc/flan-t5-small-caikit/
# Target directory must contain a config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: caikit-tgis-runtime-grpc
spec:
multiModel: false
supportedModelFormats:
# Note: this currently *only* supports caikit format models
- autoSelect: true
name: caikit
containers:
- name: kserve-container
image: quay.io/opendatahub/text-generation-inference:stable
command: ["text-generation-launcher"]
args: ["--model-name=/mnt/models/artifacts/"]
env:
- name: TRANSFORMERS_CACHE
value: /tmp/transformers_cache
# resources: # configure as required
# requests:
# cpu: 8
# memory: 16Gi
- name: transformer-container
image: quay.io/opendatahub/caikit-tgis-serving:stable
ymoatti marked this conversation as resolved.
Show resolved Hide resolved
command: ["python", "-m", "caikit.runtime.grpc_server"]
env:
- name: RUNTIME_LOCAL_MODELS_DIR
value: /mnt/models
ports:
- containerPort: 8085
name: h2c
protocol: TCP
# resources: # configure as required
# requests:
# cpu: 8
# memory: 16Gi
Xaenalt marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ spec:
# memory: 16Gi
- name: transformer-container
image: quay.io/opendatahub/caikit-tgis-serving:stable
command: ["python", "-m", "caikit.runtime.http_server"]
env:
- name: RUNTIME_LOCAL_MODELS_DIR
value: /mnt/models
Expand Down
17 changes: 11 additions & 6 deletions demo/kserve/deploy-remove-scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,26 +14,31 @@ Note: If you prefer to deploy and remove an LLM model by using step-by-step comm

**Procedure**

1. Deploy a sample LLM model.
1. Deploy a sample LLM model

For HTTP:
~~~
./scripts/test/deploy-model.sh
~~~

2. Perform inference with a HTTP or gRPC call.
For gRPC:
~~~
./scripts/test/deploy-model.sh grpc
~~~

2-http. If using HTTP:
2. Perform inference:

For HTTP:
~~~
./scripts/test/http-call.sh
~~~


2-grpc. If using gRPC:
For gRPC:
~~~
./scripts/test/grpc-call.sh
~~~

3. Delete the sample model and the MinIO namespace.
3. Delete the sample model:

~~~
./scripts/test/delete-model.sh
Expand Down
86 changes: 71 additions & 15 deletions demo/kserve/deploy-remove.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ Note: The **flan-t5-small** LLM model has been containerized into an S3 MinIO bu
ACCESS_KEY_ID=admin
SECRET_ACCESS_KEY=password
MINIO_NS=minio
```

```
oc new-project ${MINIO_NS}
oc apply -f ./custom-manifests/minio/minio.yaml -n ${MINIO_NS}
sed "s/<minio_ns>/$MINIO_NS/g" ./custom-manifests/minio/minio-secret.yaml | tee ./minio-secret-current.yaml | oc -n ${MINIO_NS} apply -f -
Expand All @@ -39,46 +41,100 @@ Note: The **flan-t5-small** LLM model has been containerized into an S3 MinIO bu

2. Deploy the LLM model with Caikit+TGIS Serving runtime

a. Create a new namespace.
a. Choose protocol to be used to invoke inferences:
Default protocol is HTTP (e.g., curl commands).
If you want to use gRPC set INF_PROTO to "-grpc" value, either skip the following command lines.

```
INF_PROTO="-grpc"
```

b. Create a new namespace.

```bash
export TEST_NS=kserve-demo
export TEST_NS="kserve-demo"
oc new-project ${TEST_NS}
```

b. Create a caikit `ServingRuntime`. By default, it requests 4CPU and 8Gi of memory. You can adjust these values as needed.
c. Create a caikit `ServingRuntime`.

By default, it requests 4CPU and 8Gi of memory. You can adjust these values as needed.

```bash
oc apply -f ./custom-manifests/caikit/caikit-tgis-servingruntime.yaml -n ${TEST_NS}
oc apply -f ./custom-manifests/caikit/caikit-tgis-servingruntime"$INF_PROTO".yaml -n ${TEST_NS}
```

c. Deploy the MinIO data connection and service account.
d. Deploy the MinIO data connection and service account.

```bash
oc apply -f ./minio-secret-current.yaml -n ${TEST_NS}
oc create -f ./serviceaccount-minio-current.yaml -n ${TEST_NS}
```

d. Deploy the inference service. It will point to the model located in the `modelmesh-example-models/llm/models` directory.
e. Deploy the inference service.

The [ISVC template file](/demo/kserve/custom-manifests/caikit/caikit-tgis-isvc-template.yaml) shown below contains all that is needed to set up the Inference Service

```bash
oc apply -f ./custom-manifests/caikit/caikit-tgis-isvc.yaml -n ${TEST_NS}
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
annotations:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
# The following <caikit-tgis-isvc-name> should be set to the
# actual name of the inference service. (e.g., caikit-tgis-isvc
# for HTTP and caikit-tgis-isvc-grpc for gRPC)
name: <caikit-tgis-isvc-name>
spec:
predictor:
# replace in following <NameOfAServiceAccount> with the name
# of a ServiceAccount that has the secret for accessing the model
serviceAccountName: <NameOfAServiceAccount>
model:
modelFormat:
name: caikit
runtime: caikit-tgis-runtime
storageUri: proto://path/to/model # single model here
# Example, using a pvc:
# storageUri: pvc://caikit-pvc/flan-t5-small-caikit/
# Target directory must contain a config.yml
```

e. Verify that the inference service's `READY` state is `True`.
Before using it, the following details have to be added:

- `<caikit-tgis-isvc-name>` should be replaced by the name of the inference
- `<NameOfAServiceAccount>` should be replaced by the actual name of the Service Account
- `proto://path/to/model` should be replaced by the actual path to the model that will run the inferences

Note: If you followed all the steps to this point, the following code will
create the needed Inference Service using the Minio with the flan-t5-small
model and the service account that have been created in the previous steps.

```bash
ISVC_NAME=caikit-tgis-isvc$INF_PROTO
oc apply -f ./custom-manifests/caikit/"$ISVC_NAME".yaml -n ${TEST_NS}
```

f. Verify that the inference service's `READY` state is `True`.

```bash
oc get isvc/caikit-example-isvc -n ${TEST_NS}
oc get isvc/$ISVC_NAME -n ${TEST_NS}
```

3. Perform inference using HTTP (default) or gRPC
3. Perform inference using HTTP or either gRPC (

Compute KSVC_HOSTNAME:
```bash
export KSVC_HOSTNAME=$(oc get ksvc "$ISVC_NAME"-predictor -n ${TEST_NS} -o jsonpath='{.status.url}' | cut -d'/' -f3)
Xaenalt marked this conversation as resolved.
Show resolved Hide resolved
```

3-http. Perform inference with HTTP. This example uses cURL.

a. Run the following `curl` command for all tokens in a single call:

```bash
export KSVC_HOSTNAME=$(oc get ksvc caikit-example-isvc-predictor -n ${TEST_NS} -o jsonpath='{.status.url}' | cut -d'/' -f3)
curl -kL -H 'Content-Type: application/json' -d '{"model_id": "flan-t5-small-caikit", "inputs": "At what temperature does Nitrogen boil?"}' https://${KSVC_HOSTNAME}/api/v1/task/text-generation
```

Expand Down Expand Up @@ -156,7 +212,6 @@ Note: The **flan-t5-small** LLM model has been containerized into an S3 MinIO bu
c. Run the following `grpcurl` command for all tokens in a single call:

```bash
export KSVC_HOSTNAME=$(oc get ksvc caikit-example-isvc-predictor -n ${TEST_NS} -o jsonpath='{.status.url}' | cut -d'/' -f3)
grpcurl -insecure -d '{"text": "At what temperature does liquid Nitrogen boil?"}' -H "mm-model-id: flan-t5-small-caikit" ${KSVC_HOSTNAME}:443 caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict
```

Expand Down Expand Up @@ -214,16 +269,17 @@ Note: The **flan-t5-small** LLM model has been containerized into an S3 MinIO bu
....
```

1. Remove the LLM model
4. Remove the LLM model

a. To remove (undeploy) the LLM model, delete the Inference Service.
a. To remove (undeploy) the LLM model, delete the Inference Service and its containing namespace:

```bash
oc delete isvc --all -n ${TEST_NS} --force --grace-period=0
oc delete ns ${TEST_NS}
```

b. Delete the MinIO resources by deleting the MinIO namespace.

```bash
oc delete ns ${TEST_NS} ${MINIO_NS}
oc delete ns ${MINIO_NS}
```
51 changes: 39 additions & 12 deletions demo/kserve/scripts/test/deploy-model.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,29 @@ set -o nounset
set -o errtrace
# set -x #Uncomment this to debug script.

# Deploys model for HTTP (default) or gRPC if "grpc" is passed as argument

# Check if at most one argument is passed
if [ "$#" -gt 1 ]; then
echo "Error: at most a single argument ('http' or 'grpc') or no argument, default protocol being 'http'"
exit 1
fi

# Default values that fit the default 'http' protocol:
INF_PROTO=""

# If we have an argument, check that it is either "http" or "grpc"
if [ "$#" -eq 1 ]; then
if [ "$1" = "http" ]; then
: ### nothing to be done
elif [ "$1" = "grpc" ]; then
INF_PROTO="-grpc"
else
echo "Error: Argument must be either 'http' or 'grpc'."
exit 1
fi
fi

source "$(dirname "$(realpath "$0")")/../env.sh"

# Deploy Minio
Expand All @@ -24,24 +47,28 @@ else
fi
sed "s/<minio_ns>/$MINIO_NS/g" ./custom-manifests/minio/serviceaccount-minio.yaml | tee ${BASE_DIR}/serviceaccount-minio-current.yaml

# Deploy a sample model
# Test if ${TEST_NS} namespace already exists:
oc get ns ${TEST_NS}
if [[ $? == 1 ]]
then
oc new-project ${TEST_NS}

oc apply -f ./custom-manifests/caikit/caikit-tgis-servingruntime.yaml -n ${TEST_NS}
oc new-project ${TEST_NS}

oc apply -f ./custom-manifests/caikit/caikit-tgis-servingruntime"${INF_PROTO}".yaml -n ${TEST_NS}

oc apply -f ${BASE_DIR}/minio-secret-current.yaml -n ${TEST_NS}
oc apply -f ${BASE_DIR}/serviceaccount-minio-current.yaml -n ${TEST_NS}

oc apply -f ${BASE_DIR}/minio-secret-current.yaml -n ${TEST_NS}
oc apply -f ${BASE_DIR}/serviceaccount-minio-current.yaml -n ${TEST_NS}
### create the isvc. First step: create the yaml file
ISVC_NAME=caikit-tgis-isvc"${INF_PROTO}"
oc apply -f ./custom-manifests/caikit/"$ISVC_NAME".yaml -n ${TEST_NS}

oc apply -f ./custom-manifests/caikit/caikit-tgis-isvc.yaml -n ${TEST_NS}
# Resources needed to enable metrics for the model
# The metrics service needs the correct label in the `matchLabel` field. The expected value of this label is `<isvc-name>-predictor-default`
# The metrics service in this repo is configured to work with the example model. If you are deploying a different model or using a different model name, change the label accordingly.

# Resources needed to enable metrics for the model
# The metrics service needs the correct label in the `matchLabel` field. The expected value of this label is `<isvc-name>-predictor-default`
# The metrics service in this repo is configured to work with the example model. If you are deploying a different model or using a different model name, change the label accordingly.
oc apply -f custom-manifests/metrics/caikit-metrics-service.yaml -n ${TEST_NS}
oc apply -f custom-manifests/metrics/caikit-metrics-servicemonitor.yaml -n ${TEST_NS}
### TBD: Following 2 line should take into account the changed names
# oc apply -f custom-manifests/metrics/caikit-metrics-service.yaml -n ${TEST_NS}
# oc apply -f custom-manifests/metrics/caikit-metrics-servicemonitor.yaml -n ${TEST_NS}
else
echo
echo "* ${TEST_NS} exist. Please remove the namespace or use another namespace name"
Expand Down
Loading