-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sync - 20240909] Upstream master to ODH master #403
Conversation
* propagate trc bool across vllm init Signed-off-by: Calvin Woo <[email protected]> Signed-off-by: calvin d. woo <[email protected]> * use args directly to avoid undefined var Signed-off-by: Calvin Woo <[email protected]> Signed-off-by: calvin d. woo <[email protected]> * Remove trailing space Signed-off-by: Dan Sun <[email protected]> Signed-off-by: calvin d. woo <[email protected]> * move params to newline Signed-off-by: calvin d. woo <[email protected]> --------- Signed-off-by: Calvin Woo <[email protected]> Signed-off-by: calvin d. woo <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Dan Sun <[email protected]>
The KServe Python SDK README.md uses relative URLs that work well on GitHub but return a 404 error when visited on PyPI. This change updates the README.md to use absolute URLs that work well on both GitHub and PyPI. Signed-off-by: kevinbazira <[email protected]>
check empty model final. Signed-off-by: HAO <[email protected]> Co-authored-by: koshino17 <[email protected]>
* Fix No model ready error in multi model serving - Fixes the regression introduced by kserve#3275 Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Mark transformer model ready in init method Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Initial implementation of inference client Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add tests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Use Inference client for e2e tests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Upgrade pytest-asyncio to 0.23.4 Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix mutable object initialization in default parameters Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix graph e2e tests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix pmml test Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add explain, support dict response, use inference client for internal requests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix inference graph test and grpc headers Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Remove v1 datamodels Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Introduce protocol in client config Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Support inference graph Signed-off-by: Sivanantham Chinnaiyan <[email protected]> remove logging configs Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Update default timeout to 60 seconds Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add retry config for grpc client Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix infer model_name parameter Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add tests for graph endpoint Signed-off-by: Sivanantham Chinnaiyan <[email protected]> debug Signed-off-by: Sivanantham Chinnaiyan <[email protected]> fix http client param mismatch Signed-off-by: Sivanantham Chinnaiyan <[email protected]> skip graph test Signed-off-by: Sivanantham Chinnaiyan <[email protected]> fix timeout in grpc client Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix url construction Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix explain Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * configure logger for e2e tests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Rebase master Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Rebase master Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix grpc retry config Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Increase request timeout Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * configure logger for e2e tests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Rebase master Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Rebase master Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Fix grpc retry config Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Increase request timeout Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Rebase Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Use fixtures for rest client Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Fix model name not properly parsed by inference graph Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Handle single string arg with excess whitespace Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Handle duplicate arguments Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Dan Sun <[email protected]>
empty commit Signed-off-by: Spolti <[email protected]>
Use add_generation_rompt for chat template Signed-off-by: Dattu Sharma <[email protected]>
* Deduplicate the names for the additional domain names Signed-off-by: Vincent Hou <[email protected]> * Refactoring the functions Signed-off-by: Vincent Hou <[email protected]> --------- Signed-off-by: Vincent Hou <[email protected]>
virtual service case insensitive Signed-off-by: Andrews Arokiam <[email protected]>
* Install packages needed for model load Signed-off-by: Gavrish Prabhu <[email protected]> * make all apt get into a single line Signed-off-by: Gavrish Prabhu <[email protected]> --------- Signed-off-by: Gavrish Prabhu <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
…serve#3789) * Add readiness probe for mlserver in CI Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Increase memory limit for pmml test to prevent OOMKilled and read timeout error Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Fix logprobs Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix a scenario where stream completion fails if echo is true and logprobs is nil Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix a scenario where completion fails if the prompt is token_ids and echo is set to true Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Respect tokenizer revision Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add workaround for adding None to token_logprobs and top_logprobs Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
agent watcher unit test is always flaky so increase timeout to make it stable Signed-off-by: jooho lee <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Add tests for vLLM Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * resolve comments Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Uncomment tests for fixed bugs Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
….3 (kserve#3812) * Upgrade serving runtime python version to 3.11 and debian to bookworm Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Upgrade poetry to 1.8.3 Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Upgrade openjdk to 17 for pmml Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix 'AS' casing warning Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix pmml server Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Bump vLLM to 0.5.3.post1 Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update makefile Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * approx probability comparison Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Set multiprocessing method to spawn Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
…se 'spawn' for mutiprocessing (kserve#3757) * Refactor model server to let uvicorn handle multiple workers - Refactored the ModelServer to let uvicorn handle multiple workers. This will remove the bottleneck of using 'fork' for multiprocessing - Make FastAPI app instance easily accessible across the project so that users can easily add middlewares and custom exception handlers for custom models. - Use uvloop eventpolicy Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add middleware example Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add e2e test Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Remove nest_asyncio in art explainer Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Remove uvloop Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Rebase master Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix python tests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * revert art explainer Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Remove monkeypatch Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Remove redundant future exception logging Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Spolti <[email protected]>
* Make ray serve an optional dependency Signed-off-by: Curtis Maddalozzo <[email protected]> Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Unify the log configuration using kserve logger (kserve#3577) * Configure logging for serving runtimes Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add pyyaml dependency Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * black format Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * fix pyproject.toml Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Rebase master Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * cleanup logger for e2e Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Modify logger format to include func name Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Log model download time. Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Rebase master Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Allow disabling logger configuration and deprecate logger related arg in model server Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Rebase master Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Resolve comments Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * pyyaml=^6.0.0 to fix build failure Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Remove logger related parameters from model server Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * import model_server Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix lint Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix linting Signed-off-by: Curtis Maddalozzo <[email protected]> Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Rebase, minor fixes and add e2e test Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Curtis Maddalozzo <[email protected]> Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Co-authored-by: Curtis Maddalozzo <[email protected]> Co-authored-by: Dan Sun <[email protected]>
* Update aif example chore: Update aif explainer example. - Bump KServer to 0.13.0, it will bring some library updates, plus, it fixes a few security alerts in this example. - update the scikit-learn package name Signed-off-by: Spolti <[email protected]> * move the local instructions to the README Signed-off-by: Spolti <[email protected]> * empty commit Signed-off-by: Spolti <[email protected]> --------- Signed-off-by: Spolti <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
…ve#3737) These changes introduce the possibility to configure KServe with its own Istio local gateway, to partially decouple KServe from the Knative local gateway. Typically, it is OK to re-use the already configured Knative local gateway for KServe uses (as long as configs do not conflict). However, there are cases where having a dedicated local gateway for KServe is beneficial. Just to give some examples: * To have the ability to use strict mTLS in Istio * To reduce some pressure on the Knative local gateway by having a dedicated gateway deployment (it still would hit Knative gateway, but only once, rather than twice) * To be able to configure TLS on cluster-local hostnames (Knative support is still experimental) To have a dedicated Gateway in KServe, similar configurations to Knative are need to be done. At the very least, and if not having a dedicated gateway deployment, a v1/Service and an Istio Gateway resource need to be created for KServe. Such resources would need to be configured in _localGateway_ and _localGatewayService_. KServe still needs to rely on Knative routing for the KSVCs it creates. Thus, after handling an incoming request and resolving its target, it needs to be forwarded to be handled by Knative. This is the reason for introducing a new `knativeLocalGatewayService` in the ConfigMap. The removed `ingressService` seems to be unused. Apparently, it became unused when the v1alpa1 API of the InferenceServices was deprecated and removed. Signed-off-by: Edgar Hernández <[email protected]>
* Add support for Azure DNS zone endpoints Signed-off-by: tjandy98 <[email protected]> * Add test cases for Azure Blob and File Share URI pattern matching Signed-off-by: tjandy98 <[email protected]> * flake8 Signed-off-by: tjandy98 <[email protected]> * black Signed-off-by: tjandy98 <[email protected]> --------- Signed-off-by: tjandy98 <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
* Add logging request feature for vLLM Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add log request feature for huggingface Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Update 0.14.0-rc0 release Signed-off-by: Dan Sun <[email protected]> * Add security context Signed-off-by: Dan Sun <[email protected]> * Update helm doc Signed-off-by: Dan Sun <[email protected]> * Update crd Signed-off-by: Dan Sun <[email protected]> --------- Signed-off-by: Dan Sun <[email protected]>
Use API tokens for publishing package PyPI Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* Fix sdlc Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add option for only installing deps in quick install Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Increase cpu & memory request for controller Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix 0.14.0-rc0 release Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* add security context and resources to rbac proxy container Signed-off-by: Gavin Li <[email protected]> * feedback Signed-off-by: Gavin Li <[email protected]> --------- Signed-off-by: Gavin Li <[email protected]>
Remove unwanted secret permissions Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
* bump to vllm 0.5.5 Signed-off-by: Lize Cai <[email protected]> * fix parse_and_batch_prompt import Signed-off-by: Lize Cai <[email protected]> --------- Signed-off-by: Lize Cai <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: jooho lee <[email protected]>
* Implement health endpoint for vLLM backend Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add openai health endpoint Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
…vingruntimes (kserve#3917) * Add security context for runtimes Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add security context for runtimes helm Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add security best practices for ig Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Disable service account secret auto mount Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * fmt Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update test cases Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add user id for tensorflow, triton and torchserve Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: jooho lee <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jooho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: jooho lee <[email protected]>
Signed-off-by: jooho lee <[email protected]>
Signed-off-by: jooho lee <[email protected]>
Signed-off-by: jooho lee <[email protected]>
Signed-off-by: jooho lee <[email protected]>
…lux/component-updates/kserve-storage-initializer-211 Update kserve-storage-initializer-211 to a42e1b0
Can it be related to the tests needing more time to complete? I didn't spot tests failing. The other failure, not related with tests as well
|
/retest |
@Jooho: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this for working on https://issues.redhat.com/browse/RHOAIENG-10277, since this sync contains changes required on the ODH setup: the support for dedicated KServe gateways (related to opendatahub-io/opendatahub-operator#1056). Without those changes, odh/master is broken in the ODH setup. So, it made more sense to use this sync.
The new markers that are relevant for ODH E2Es are:
- predictor
- path_based_routing
The other 7 markers are for cases not well supported in ODH. I was able to figure out the needed configs and changes to the codebase to successfully run this markers with 100% of E2Es passing, excluding GRPC-related tests which are not working in ODH.
The previous means that there is confidence that this code sync is working. However, in code review several poetry lockfiles are too different from upstream. IMHO, such large difference is not OK. I would have rejected this PR because of the large differences in poetry lockfiles.
Side finds:
- A review is needed of ODH overlays, as they look
updatedoutdated. I had to fix the overlays to have a working setup. - OWNER files need to be updated.
- We need to revert the large resources requests on the
manager
that was applied in Increase memory limit of kserve-controller pod #261, since upstream improved memory usage in Remove cluster level list/watch for configmaps, serviceaccounts, secrets kserve/kserve#3469 - We need to contribute some changes to the E2Es code to make them compatible with openshift-ci.
In any case, it was already decided to drop this PR in favor of creating a more up-to-date one. So, closing this PR...
Regular Sync with upstream kserve master branch to odh master branch.