-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RHOAIENG-1051] - update opendatahub folder to kustomize #280
Conversation
@spolti I followed the description and hit errors:
|
@spolti Basic folder is only one to test or are there any other folders? |
@Jooho it should be working now, not sure why it happened, but was able to run previously locally. |
now, it removed the modelmesh-serving folder itself.
please double check |
chore: opendatahub folder have - quickstart - docs - manifests/scripts to support fvt After transition manifests are merged, the way to deploy modelmesh-serving is changed so we should update all related one. Signed-off-by: Spolti <[email protected]>
/retest @spolti With this pr, what should I test?
anything else? |
only these 3 should be fine. |
kustomize build ${MANIFESTS_DIR}/runtimes/ |oc delete -f - | ||
oc delete pvc,pod --all --force -n modelmesh-serving | ||
oc delete ns $namespace | ||
kustomize build "${MANIFESTS_DIR}"/runtimes/ |oc delete -f - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one should be moved after line number 35 because of crd creation.
basic/hpa worked but pvc failed. modelmesh-controller failed
|
Latest uses go1.22 while we use 1.20 Signed-off-by: Spolti <[email protected]>
Is this quickstart still deploying model-mesh? |
FVT is working fine, just one test is failing, seems to be a issue with it: Scaling of runtime deployments with HPA Autoscaler when there are no predictors Scale all runtimes down after a created test predictor is deleted
/Users/fspolti/data/dev/sources/modelmesh-serving/fvt/hpa/hpa_test.go:149
2024-04-05T17:45:14-03:00 INFO Delete all predictors ...
2024-04-05T17:45:17-03:00 INFO Watcher got event with object {"name": "modelmesh-serving-mlserver-1.x", "replicas": 0, "available": 0, "updated": 0}
2024-04-05T17:45:17-03:00 INFO deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:false modelmesh-serving-torchserve-0.x:false modelmesh-serving-triton-2.x:false]
2024-04-05T17:45:17-03:00 INFO Watcher got event with object {"name": "modelmesh-serving-ovms-1.x", "replicas": 0, "available": 0, "updated": 0}
2024-04-05T17:45:17-03:00 INFO deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:false modelmesh-serving-triton-2.x:false]
2024-04-05T17:45:17-03:00 INFO Watcher got event with object {"name": "modelmesh-serving-torchserve-0.x", "replicas": 0, "available": 0, "updated": 0}
2024-04-05T17:45:17-03:00 INFO deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:false]
2024-04-05T17:45:17-03:00 INFO Watcher got event with object {"name": "modelmesh-serving-triton-2.x", "replicas": 0, "available": 0, "updated": 0}
2024-04-05T17:45:17-03:00 INFO deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:true]
2024-04-05T17:45:17-03:00 INFO All deployments are ready: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:true]
2024-04-05T17:45:27-03:00 INFO Timed out after 10s without events
STEP: Creating a test predictor for one Runtime @ 04/05/24 17:45:27.361
STEP: Creating predictor mlserver-sklearn-mnist-svm-gpqgm @ 04/05/24 17:45:27.361
STEP: Waiting for predictor mlserver-sklearn-mnist-svm-gpqgm to be 'Loaded' @ 04/05/24 17:45:27.665
2024-04-05T17:45:27-03:00 INFO Watcher got event with object {"name": "mlserver-sklearn-mnist-svm-gpqgm", "status.available": false, "status.activeModelState": "Pending", "status.targetModelState": "", "status.transitionStatus": "UpToDate", "status.lastFailureInfo": null}
2024-04-05T17:45:27-03:00 INFO Watcher got event with object {"name": "mlserver-sklearn-mnist-svm-gpqgm", "status.available": false, "status.activeModelState": "Pending", "status.targetModelState": "", "status.transitionStatus": "UpToDate", "status.lastFailureInfo": {"message":"Waiting for runtime Pod to become available","modelId":"mlserver-sklearn-mnist-svm-gpqgm__ksp-b20a0c5aca","reason":"RuntimeUnhealthy"}}
[FAILED] in [It] - /Users/fspolti/data/dev/sources/modelmesh-serving/fvt/helpers.go:355 @ 04/05/24 17:47:27.664
2024-04-05T17:47:27-03:00 INFO Running command {"args": "kubectl get predictors -n model-serving"}
=====================================================================================================================================
NAME TYPE AVAILABLE ACTIVEMODEL TARGETMODEL TRANSITION AGE
mlserver-sklearn-mnist-svm-gpqgm sklearn false Pending UpToDate 2m3s |
Note that I haven't reviewed the code and I think I don't have full context. So, just trying to answer... In most projects I have played with, quickstarts assume a clean environment and install everything to quickly give you a working setup. Also, usually quickstarts are for trying the project (i.e. non-production, demos). Because of this, a quickstart don't let you customize the setup (that's left to the official installer). I would agree that deploying a sample model should be left to a different script than the quickstart setup, although it could be part of the same doc page (probably arranged like a tutorial). That said, I also didn't use too much the quickstarts because what I personally remember is that, rather than providing you with a setup that you can play with, I think it prepared the env more like a demo that is also suited for running FVTs/CI... and I had to spend time "cleaning" the env. This is different from your case: you just want to deploy a sample model on an existent setup, while I wanted to quickly have a base setup (without additional stuff) to try my own models. ... but don't trust me about this (I may be remembering incorrectly why I didn't use the quickstarts that much). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I did e2e test and it works
CONTROLLERNAMESPACE=opendatahub NAMESPACE=modelmesh-serving NAMESPACESCOPEMODE=true make e2e-test-for-odh
...
Ginkgo ran 1 suite in 3m56.85221035s
Test Suite Passed
Passed fvt/hpa. Move on the next test
[SUCCESS] FVT Test Passed!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jooho, spolti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
#### Motivation Support for TorchServe was added in opendatahub-io#250 and kserve/modelmesh-runtime-adapter#34. A test should be added for it as well. #### Modifications - Adds basic FVT for load/inference with a TorchServe MAR model using the native TorchServe gRPC API - Disables OVMS runtime and tests to allow TorchServe to be tested due to resource constraints #### Result Closes opendatahub-io#280 Signed-off-by: Rafael Vasquez <[email protected]>
chore: opendatahub folder have
- quickstart
- docs
- manifests/scripts to support fvt After transition manifests are merged, the way to deploy modelmesh-serving is changed so we should update all related one.
How to test:
PR checklist
Checklist items below are applicable for development targeted to both fast and stable branches/tags
Checklist items below are applicable for development targeted to both fast and stable branches/tags