Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add lm-eval-service controller (#258) * feat: Initial database support (#246) * Initial database support - Add status checking - Add better storage flags - Add spec.storage.format validation - Add DDL -Add HIBERNATE format to DB (test) - Update service image - Revert identifier to DATABASE - Update CR options (remove mandatory data) * Remove default DDL generation env var * Update service image to latest tag * Add migration awareness * Add updating pods for migration * Change JDBC url from mysql to mariadb * Fix TLS mount * Revert images * Remove redundant logic * Fix comments * feat: Add TLS certificate mount on ModelMesh (#255) * feat: Add TLS certificate mount on ModelMesh * Revert from http to https until kserve/modelmesh#147 is merged * Add lm-eval-service controller refactor the existing TrustyAIService controller and add LMEvalService controller Signed-off-by: Yihong Wang <[email protected]> --------- Signed-off-by: Yihong Wang <[email protected]> Co-authored-by: Rui Vieira <[email protected]> * fix: Fix typo in operator's arguments (#261) Operator's arguments changed from `--eanble-services` to `--enable-services`. trustyai.opendatahub.io_lmevaljobs.yaml and zz_generated.deepcopy.go regenerated. * feat: Add LMES driver build to GHA (#272) * sync: sync dev/lm-eval with main branch (#271) * feat: Initial database support (#246) * Initial database support - Add status checking - Add better storage flags - Add spec.storage.format validation - Add DDL -Add HIBERNATE format to DB (test) - Update service image - Revert identifier to DATABASE - Update CR options (remove mandatory data) * Remove default DDL generation env var * Update service image to latest tag * Add migration awareness * Add updating pods for migration * Change JDBC url from mysql to mariadb * Fix TLS mount * Revert images * Remove redundant logic * Fix comments * feat: Add TLS certificate mount on ModelMesh (#255) * feat: Add TLS certificate mount on ModelMesh * Revert from http to https until kserve/modelmesh#147 is merged * Pin oc version, ubi version (#263) * Restore checkout of trustyai-exp (#265) * Add operator installation robustness (#266) * fix: Skip InferenceService patching for KServe RawDeployment (#262) * feat: ConfigMap key to disable KServe Serverless configuration (#267) * feat: Add support for custom certificates in database connection (#259) * Add TLS endpoint for ModelMesh payload processors. (#268) Keep non-TLS endpoint for KServe Serverless (disabled by default) --------- Signed-off-by: Yihong Wang <[email protected]> Co-authored-by: Rui Vieira <[email protected]> Co-authored-by: Rob Geada <[email protected]> * Weekly sync up of dev/lm-eval branch (#278) * feat: Initial database support (#246) * Initial database support - Add status checking - Add better storage flags - Add spec.storage.format validation - Add DDL -Add HIBERNATE format to DB (test) - Update service image - Revert identifier to DATABASE - Update CR options (remove mandatory data) * Remove default DDL generation env var * Update service image to latest tag * Add migration awareness * Add updating pods for migration * Change JDBC url from mysql to mariadb * Fix TLS mount * Revert images * Remove redundant logic * Fix comments * feat: Add TLS certificate mount on ModelMesh (#255) * feat: Add TLS certificate mount on ModelMesh * Revert from http to https until kserve/modelmesh#147 is merged * Pin oc version, ubi version (#263) * Restore checkout of trustyai-exp (#265) * Add operator installation robustness (#266) * fix: Skip InferenceService patching for KServe RawDeployment (#262) * feat: ConfigMap key to disable KServe Serverless configuration (#267) * feat: Add support for custom certificates in database connection (#259) * Add TLS endpoint for ModelMesh payload processors. (#268) Keep non-TLS endpoint for KServe Serverless (disabled by default) * fix: Correct maxSurge and maxUnavailable (#275) * feat: Add support for custom DB names (#257) * feat: Add support for custom DB names * fix: Correct custom DB name --------- Signed-off-by: Yihong Wang <[email protected]> Co-authored-by: Rui Vieira <[email protected]> Co-authored-by: Rob Geada <[email protected]> * Driver updates job's status periodically (#280) The driver periodically update the LMEvalJob.Status.Message field with the outputs from the lm-eval. The message pattern the driver captures is like `Running text generation: 81%|`. Then users can use this information to check the progress of the job. Signed-off-by: Yihong Wang <[email protected]> * Add Dockerfile for LMES job image (#276) Add Dockerfile for LMES job image and the needed files Signed-off-by: Yihong Wang <[email protected]> * feat: Add overlays (#283) * feat: Add overlays * Remove redundant lmes-tas overlay. Change job image name. * Add job image build (#284) * Change job image use midstream lm-evaluation-harness (#285) * feat: support batch size (#290) Add batch size support in the LMEvalJob which leverages the `--batch_size` in the `lm-evaluation-harness`. This only affects the local models. The `--bath_size` doesn't work for remote inference APIs. Signed-off-by: Yihong Wang <[email protected]> * Add the `openai` package into the lmes job image (#292) update the LMES job's Dockerfile to include the `openai` package. Signed-off-by: Yihong Wang <[email protected]> * fix: fix dependency error in the job image (#296) Split up the unitxt and openai dependencies to avoid the conflict. Signed-off-by: Yihong Wang <[email protected]> * feat: add device detection in lmes driver (#298) Added a new feature in LMES driver to detect the available devices by using the PyTorch API. This feature can be disabled by passing the `--detect-device false` option. Signed-off-by: Yihong Wang <[email protected]> * feat: support unitxt recipes (#301) Add new fields in the CRD to support unitxt recipes and leverage the driver to create corresponding yaml files of the unitxt recipes. Signed-off-by: Yihong Wang <[email protected]> * feat: support custom dataset (#309) Updated the CRD data struct to allow users to specify a custom Unitxt card in JSON format. The custom Unitxt card is equivalent to a custom dataset definition. Also restructured and updated the CRD to support Volumes, VolumeMounts, Env, Resources, Labels, and Annotations. Signed-off-by: Yihong Wang <[email protected]> * feat: new pulling mechanism for job statuses (#314) Update the driver to keep running even the user program finishes. The driver provides two APIs: - GetStatus(): retrieve job status - Shutdown(): properly tear down the driver In the controller side, it uses `pod/exec` resource to run the driver command to invoke the driver APIs to retrieve the job status and shutdown the driver when job is done. Signed-off-by: Yihong Wang <[email protected]> * Move operator's cmd/operator/main.go to cmd/main.go to keep operator-sdk compatibility (#295) * Remove hardcoded job's user ID (#322) * Fix mkdir command in Job dockerfile (#330) * Refactor some lmesreconcile methods (#323) * Refactor lmes reconcile optoins Signed-off-by: ted chang <[email protected]> * Update controllers/lmes/lmevaljob_controller.go Co-authored-by: Yihong Wang <[email protected]> * Update controllers/lmes/lmevaljob_controller.go Co-authored-by: Yihong Wang <[email protected]> Signed-off-by: ted chang <[email protected]> --------- Signed-off-by: ted chang <[email protected]> Co-authored-by: Yihong Wang <[email protected]> * tidy: clean up lmes-job image (#333) remove BAM related packages and patch. Signed-off-by: Yihong Wang <[email protected]> * Enable job suspend for Kueue (#317) * Refactor lmes reconcile optoins Signed-off-by: ted chang <[email protected]> * Update controllers/lmes/lmevaljob_controller.go Co-authored-by: Yihong Wang <[email protected]> * Update controllers/lmes/lmevaljob_controller.go Co-authored-by: Yihong Wang <[email protected]> Signed-off-by: ted chang <[email protected]> * Enable job suspend for Kueue Signed-off-by: ted chang <[email protected]> --------- Signed-off-by: ted chang <[email protected]> Co-authored-by: Yihong Wang <[email protected]> * Add overlay placeholders for main merge (#334) * sync: sync up dev/lm-eval branch with main branch (#336) * [CI] Run tests from trustyai-tests (#279) * Change Dockerfile to clone trustyai-tests * Add PYTEST_MARKERS env and remove TESTS_REGEX * RHOAIENG-12274: Update operator's overlays (#287) * Update operator's overlays * Update kustomization.yaml * Add devflag printout to GH Action comment (#289) * Add timeout loop to DSC install (#305) * RHOAIENG-13625: Add DBAvailable status to CR (#304) * Add DBAvailable status to CR * Remove probes * Add KServe destination rule for Inference Services in the ServiceMesh (#315) * Add DestinationRule creation for KServe serverless * Add permissions for destination rules * Add role for destination rules * Add missing role for creating destination rules * Fix spacing in DestinationRule template * Add check if DestinationRule CRD is present before creating it (#316) * Add check for DestinationRule CRD * Add API extensions to operator's scheme * Add permission for CRD resource * Fix operator metrics service target port (#320) * Add readiness probes (#312) * Enable KServe serverless in the rhoai overlay (#321) * Update overlay images (#331) * Add correct CA cert to JDBC (#324) * Add correct CA cert to JDBC * Add require SSL * Support for VirtualServices for InferenceLogger traffic (#332) * Generate KServe Inference Logger in conformance with DestinationRule and VirtualService * Add VirtualService creation for models in the mesh * Add permissions for VirtualServices * Update manifests for VirtualServices * Fix VirtualServiceName variable * fix yaml linter after the sync Signed-off-by: Yihong Wang <[email protected]> * tidy the go.mod and go.sum as well Signed-off-by: Yihong Wang <[email protected]> --------- Signed-off-by: Yihong Wang <[email protected]> Co-authored-by: Adolfo Aguirrezabal <[email protected]> Co-authored-by: Rui Vieira <[email protected]> Co-authored-by: Rob Geada <[email protected]> Co-authored-by: Rui Vieira <[email protected]> --------- Signed-off-by: Yihong Wang <[email protected]> Signed-off-by: ted chang <[email protected]> Co-authored-by: Yihong Wang <[email protected]> Co-authored-by: Rob Geada <[email protected]> Co-authored-by: ted chang <[email protected]> Co-authored-by: Adolfo Aguirrezabal <[email protected]>
- Loading branch information