Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tgis-standalone/caikit-standalone manifests #186

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions demo/kserve/custom-manifests/caikit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Caikit serving

This directory includes [InferenceService](https://kserve.github.io/website/latest/reference/api/#serving.kserve.io/v1beta1.InferenceService) and [ServingRuntime](https://kserve.github.io/website/latest/reference/api/#serving.kserve.io/v1alpha1.ServingRuntime) definitions for language model serving using [caikit](https://github.com/caikit/caikit) and [caikit-nlp](https://github.com/caikit/caikit-nlp).

- [caikit-standalone](./caikit-standalone): caikit-only solution
- [caikit-tgis](./caikit-tgis): caikit frontend with a [text-generation-inference](https://github.com/IBM/text-generation-inference) (tgis) backend solution. See [tgis](../tgis/) for a tgis-only solution.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
annotations:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
name: caikit-standalone-isvc-grpc
spec:
predictor:
serviceAccountName: sa
model:
# https://github.com/kserve/modelmesh-serving/blob/main/docs/predictors/setup-storage.md#3-add-a-storage-entry-to-the-storage-config-secret
modelFormat:
name: caikit
runtime: caikit-standalone-runtime-grpc
storageUri: s3://modelmesh-example-models/llm/models/flan-t5-small-caikit # single model here: target directory must contain a config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
annotations:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
name: caikit-standalone-isvc
spec:
predictor:
# replace in following <NameOfAServiceAccount> with the name
# of a ServiceAccount that has the secret for accessing the model
serviceAccountName: <NameOfAServiceAccount>
model:
modelFormat:
name: caikit
# Replace with the actual name of the deployed ServingRuntime
runtime: <NameOfTheServingRuntime>
storageUri: s3://modelmesh-example-models/llm/models/flan-t5-small-caikit # single model here: target directory must contain a config.yml
# Example, using a pvc:
# storageUri: pvc://caikit-pvc/flan-t5-small-caikit/
# Target directory must contain a config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
annotations:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
name: caikit-standalone-isvc
spec:
predictor:
serviceAccountName: sa
model:
# https://github.com/kserve/modelmesh-serving/blob/main/docs/predictors/setup-storage.md#3-add-a-storage-entry-to-the-storage-config-secret
modelFormat:
name: caikit
runtime: caikit-standalone-runtime
storageUri: s3://modelmesh-example-models/llm/models/flan-t5-small-caikit # single model here: target directory must contain a config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: caikit-standalone-runtime-grpc
spec:
multiModel: false
supportedModelFormats:
# Note: this currently *only* supports caikit format models
- autoSelect: true
name: caikit
containers:
- name: kserve-container
image: quay.io/opendatahub/caikit-nlp:stable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image isn't available (yet). Waiting on openshift/release#46641 for the workflows required to push the image on quay and the creation of a release branch on https://github.com/opendatahub-io/caikit-nlp

command: ["python", "-m", "caikit.runtime.grpc_server"]
env:
- name: RUNTIME_LOCAL_MODELS_DIR
value: /mnt/models
ports:
- containerPort: 8085
name: h2c
protocol: TCP
# resources: # configure as required
# requests:
# cpu: 8
# memory: 16Gi
readinessProbe:
exec:
command:
- python
- -m
- caikit_health_probe
- readiness
livenessProbe:
exec:
command:
- python
- -m
- caikit_health_probe
- liveness
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: caikit-standalone-runtime
spec:
multiModel: false
supportedModelFormats:
# Note: this currently *only* supports caikit format models
- autoSelect: true
name: caikit
containers:
- name: kserve-container
image: quay.io/opendatahub/caikit-nlp:stable
command: ["python", "-m", "caikit.runtime.http_server"]
env:
- name: RUNTIME_LOCAL_MODELS_DIR
value: /mnt/models
ports:
- containerPort: 8080
protocol: TCP
# resources: # configure as required
# requests:
# cpu: 8
# memory: 16Gi
readinessProbe:
exec:
command:
- python
- -m
- caikit_health_probe
- readiness
livenessProbe:
exec:
command:
- python
- -m
- caikit_health_probe
- liveness
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
name: caikit-tgis-config
namespace: kserve-test
data:
caikit.yml: |
runtime:
library: caikit_nlp
local_models_dir: /mnt/models/
lazy_load_local_models: true
grpc:
server_thread_pool_size: 64

model_management:
finders:
default:
type: MULTI
config:
finder_priority:
- tgis-auto
tgis-auto:
type: TGIS-AUTO
config:
test_connection: true
initializers:
default:
type: LOCAL
config:
backend_priority:
- type: TGIS
config:
connection:
hostname: localhost:8033
---
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: caikit-runtime
annotations:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
namespace: kserve-test
spec:
multiModel: false
supportedModelFormats:
# Note: this currently *only* supports caikit format models
- autoSelect: true
name: caikit
containers:
- name: kserve-container
image: quay.io/opendatahub/text-generation-inference:stable-bafd218
imagePullPolicy: IfNotPresent
command: ["text-generation-launcher"]
args: [
# NOTE:--num-shard defaults to 1
"--model-name=/mnt/models/artifacts/",
"--max-batch-size=256",
"--max-concurrent-requests=64",
]
# ports:
# - containerPort: 8033
# name: h2c
# protocol: TCP
env:
- name: TRANSFORMERS_CACHE
value: /tmp/transformers_cache
# resources: # configure as required
# requests:
# cpu: 8
# memory: 16Gi
# livenessProbe:
# grpc:
# port: 8083
# readinessProbe:
# exec:
# command:
# - curl --fail http://localhost:3000/health
# # TODO: Add grpc endpoint
# # grpc:
# # port: 8033
# # httpGet:
# # path: /health
# # port: 3000
- name: transformer-container
image: quay.io/opendatahub/caikit-tgis-serving:fast
imagePullPolicy: IfNotPresent
env:
# Optional values:
# - name: PT2_COMPILE # Slows down model loading, but provides a speedup in inference
# value: true
# - name: FLASH_ATTENTION # Optimizes certain models, see https://github.com/IBM/text-generation-inference#converting-weights-to-safetensors-format
# value: true
volumeMounts:
- name: config-volume
mountPath: /caikit/config/
readOnly: true
ports:
- containerPort: 8085
protocol: TCP
# - containerPort: 8080 # http
# protocol: TCP
# grpc:
# port: 8085
# livenessProbe:
# grpc:
# port: 8085
# initialDelaySeconds: 10
# readinessProbe: # http readiness
# httpGet:
# path: /health
# port: 8080
# initialDelaySeconds: 10
# livenessProbe: # http liveness
# httpGet:
# path: /health
# port: 8080

# resources: # configure as required
# requests:
# cpu: 8
# memory: 16Gi
volumes:
- name: config-volume
configMap:
name: caikit-tgis-config
Loading