Releases · kserve/kserve

16 Oct 11:16

yuzisun

v0.14.0

7e43642

v0.14.0 Latest

Latest

What's Changed

Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
Extract openai predict logic into smaller methods by @grandbora in #3716
Bump MLServer to 1.5.0 by @sivanantha321 in #3740
Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
Propagate trust_remote_code flag throughout vLLM startup by @calwoo in #3729
Fix dead links on PyPI by @kevinbazira in #3754
Fix model is ready even if there is no model by @HAO2167 in #3275
Fix No model ready error in multi model serving by @sivanantha321 in #3758
Initial implementation of Inference client by @sivanantha321 in #3401
Fix logprobs for vLLM by @sivanantha321 in #3738
Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
pillow - Buffer Overflow by @spolti in #3598
Use add_generation_prompt while creating chat template by @Datta0 in #3775
Deduplicate the names for the additional domain names by @houshengbo in #3773
Make Virtual Service case-insensitive by @andyi2it in #3779
Install packages needed for vllm model load by @gavrissh in #3802
Make gRPC max message length configurable by @sivanantha321 in #3741
Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
Increase timeout to make unit test stable by @Jooho in #3808
Upgrade CI deps by @sivanantha321 in #3822
Add tests for vLLM by @sivanantha321 in #3771
Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
Make ray an optional dependency by @sivanantha321 in #3834
Update aif example by @spolti in #3765
Use helm for quick installation by @sivanantha321 in #3813
Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
Add support for Azure DNS zone endpoints by @tjandy98 in #3819
Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
Add logging request feature for vLLM backend by @sivanantha321 in #3849
Bump vLLM to 0.5.4 by @sivanantha321 in #3874
Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
Update KServe 2024-2025 Roadmap by @yuzisun in #3810
Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
Fix issue with rolling update behavior by @andyi2it in #3786
Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
Protobuf version upgrade 4.25.4 by @andyi2it in #3881
Adds optional labels and annotations to the controller by @guitouni in #3366
Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
support text embedding task in hugging face server by @kevinmingtarja in #3743
Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
[Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
adding metadata on requests by @gcemaj in #3635
Publish 0.14.0-rc0 release by @yuzisun in #3867
Use API token for publishing package to PyPI by @sivanantha321 in #3896
Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
bump to vllm 0.5.5 by @lizzzcai in #3911
pin gosec to 2.20.0 by @greenmoon55 in #3921
add a new doc 'common issues and solutions' by @Jooho in #3878
Implement health endpoint for vLLM backend by @sivanantha321 in #3850
Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
Bump Go to 1.22 by @sivanantha321 in #3912
bump to vllm 0.6.0 by @hustxiayang in #3934
Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
Fix permission error in snyk scan by @sivanantha321 in #3889
Cluster Local Model CR by @greenmoon55 in #3839
added http headers to inbound request by @andyi2it in #3895
Add prow-github-action by @sivanantha321 in #3888
Add TLS support for Inference Loggers by @ruivieira in #3863
Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
bump to vLLM0.6.1post2 by @hustxiayang in #3948
Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
Implement Huggingface model download in storage initializer by @andyi2it in #3584
Update OWNERS file by @yuzisun in #3966
Cluster local model controller by @greenmoon55 in #3860
Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970
add a new API for multi-node/multi-gpu by @Jooho in #3871
Fix update-openapigen.sh that can be executed from kserve dir by @Jooho in #3924
Add python 3.12 support and remove python 3.8 support by @sivanantha321 in #3645
Fix openssl vulnerability CWE-1395 by @sivanantha321 in #3975
Fix Kubernetes Doc Links by @jyono in #3670
Fix kserve local testing env by @yuzisun in #3981
Fix streaming response not working properly with logger by @sivanantha321 in #3847
Add a flag for automount serviceaccount token by @greenmoon55 in https://github.com/kserve/ks...

Contributors

ruivieira, greenmoon55, and 29 other contributors

Assets 8

03 Oct 08:44

yuzisun

v0.14.0-rc1

a50fdc9

v0.14.0-rc1 Pre-release

Pre-release

What's Changed

Publish 0.14.0-rc0 release by @yuzisun in #3867
Use API token for publishing package to PyPI by @sivanantha321 in #3896
Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
bump to vllm 0.5.5 by @lizzzcai in #3911
pin gosec to 2.20.0 by @greenmoon55 in #3921
add a new doc 'common issues and solutions' by @Jooho in #3878
Implement health endpoint for vLLM backend by @sivanantha321 in #3850
Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
Bump Go to 1.22 by @sivanantha321 in #3912
bump to vllm 0.6.0 by @hustxiayang in #3934
Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
Fix permission error in snyk scan by @sivanantha321 in #3889
Cluster Local Model CR by @greenmoon55 in #3839
added http headers to inbound request by @andyi2it in #3895
Add prow-github-action by @sivanantha321 in #3888
Add TLS support for Inference Loggers by @ruivieira in #3863
Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
bump to vLLM0.6.1post2 by @hustxiayang in #3948
Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
Implement Huggingface model download in storage initializer by @andyi2it in #3584
Update OWNERS file by @yuzisun in #3966
Cluster local model controller by @greenmoon55 in #3860
Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970

New Contributors

@HotsauceLee made their first contribution in #3898
@hustxiayang made their first contribution in #3934
@hdefazio made their first contribution in #3885
@ruivieira made their first contribution in #3863
@gfkeith made their first contribution in #3954

Full Changelog: v0.14.0-rc0...v0.14.0-rc1

Contributors

ruivieira, greenmoon55, and 9 other contributors

Assets 8

27 Aug 03:20

yuzisun

v0.14.0-rc0

0ad935e

v0.14.0-rc0 Pre-release

Pre-release

What's Changed

Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
Extract openai predict logic into smaller methods by @grandbora in #3716
Bump MLServer to 1.5.0 by @sivanantha321 in #3740
Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
Propagate trust_remote_code flag throughout vLLM startup by @calwoo in #3729
Fix dead links on PyPI by @kevinbazira in #3754
Fix model is ready even if there is no model by @HAO2167 in #3275
Fix No model ready error in multi model serving by @sivanantha321 in #3758
Initial implementation of Inference client by @sivanantha321 in #3401
Fix logprobs for vLLM by @sivanantha321 in #3738
Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
pillow - Buffer Overflow by @spolti in #3598
Use add_generation_prompt while creating chat template by @Datta0 in #3775
Deduplicate the names for the additional domain names by @houshengbo in #3773
Make Virtual Service case-insensitive by @andyi2it in #3779
Install packages needed for vllm model load by @gavrissh in #3802
Make gRPC max message length configurable by @sivanantha321 in #3741
Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
Increase timeout to make unit test stable by @Jooho in #3808
Upgrade CI deps by @sivanantha321 in #3822
Add tests for vLLM by @sivanantha321 in #3771
Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
Make ray an optional dependency by @sivanantha321 in #3834
Update aif example by @spolti in #3765
Use helm for quick installation by @sivanantha321 in #3813
Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
Add support for Azure DNS zone endpoints by @tjandy98 in #3819
Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
Add logging request feature for vLLM backend by @sivanantha321 in #3849
Bump vLLM to 0.5.4 by @sivanantha321 in #3874
Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
Update KServe 2024-2025 Roadmap by @yuzisun in #3810
Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
Fix issue with rolling update behavior by @andyi2it in #3786
Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
Protobuf version upgrade 4.25.4 by @andyi2it in #3881
Adds optional labels and annotations to the controller by @guitouni in #3366
Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
support text embedding task in hugging face server by @kevinmingtarja in #3743
Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
[Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
adding metadata on requests by @gcemaj in #3635

New Contributors

@calwoo made their first contribution in #3729
@guitouni made their first contribution in #3366
@zwong91 made their first contribution in #3830
@mholder6 made their first contribution in #3825
@asdqwe123zxc made their first contribution in #3303
@gcemaj made their first contribution in #3635

Full Changelog: v0.13.0...v0.14.0-rc0

Contributors

grandbora, houshengbo, and 20 other contributors

Assets 8

28 Jul 17:22

yuzisun

v0.13.1

e7d9ac8

v0.13.1

What's Changed

Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 (#3723)
Propagate trust_remote_code flag throughout vLLM startup by @calwoo (#3729)
Use add_generation_prompt while creating chat template by @Datta0 (#3775)
Fix logprobs for vLLM by @sivanantha321 (#3738)
Install packages needed for vllm model load by @gavrissh (#3802)
Publish 0.13.1 Release by @johnugeorge in #3824

Full Changelog: v0.13.0...v0.13.1

Contributors

johnugeorge, calwoo, and 3 other contributors

Assets 8

05 Jun 13:38

yuzisun

v0.13.0

1c51eee

v0.13.0

🌈 What's New?

add support for async streaming in predict by @alexagriffith in #3475
Fix: Support model parallelism in HF transformer by @gavrishp in #3459
Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
OpenAI schema by @tessapham in #3477
Support OpenAIModel in ModelRepository by @grandbora in #3590
updated xgboost to support json and ubj models by @andyi2it in #3551
Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
Add a user friendly error message for http exceptions by @grandbora in #3581
feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
Enabled the multiple domains support on an inference service by @houshengbo in #3615
Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
Add headers to predictor exception logging by @grandbora in #3658
Enhance controller setup based on available CRDs by @israel-hdez in #3472
Add openai models endpoint by @cmaddalozzo in #3666
feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
Enable dtype support for huggingface server by @Datta0 in #3613
Add method for checking model health/readiness by @cmaddalozzo in #3673
Unify the log configuration using kserve logger by @sivanantha321 in #3577
Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
Add FP16 datatype support for OIP grpc by @sivanantha321 in #3695
Add option for returning probabilities in huggingface server by @andyi2it in #3607

⚠️ What's Changed

Remove conversion webhook from manifests by @Jooho in #3476
Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
docs: Move Alibi explainer to docs by @terrytangyuan in #3579
Remove generate endpoints by @cmaddalozzo in #3654
Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700

🐛 What's Fixed

Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
Make the modelcar injection idempotent by @rhuss in #3517
Only pad left for decode-only architecture models. by @sivanantha321 in #3534
fix lint typo on Makefile by @spolti in #3569
fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
Fix model unload in server stop method by @sivanantha321 in #3587
Fix golint errors by @andyi2it in #3552
Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
Fix Pydantic 2 warnings by @cmaddalozzo in #3622
build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
fix for extract zip from gcs by @andyi2it in #3510
fix: HPA equality check should include annotations by @terrytangyuan in #3650
Fix: model id and model dir check order by @yuzisun in #3680
Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
Fix kserve version is not updated properly by python-release.sh by @sivanantha321 in #3707
Add precaution again running v1 endpoints on openai models by @grandbora in #3694
Typos and minor fixes by @alpe in #3429
Fix model_id and model_dir precedence for vLLM by @yuzisun in #3718
Fixup max_length for HF and model info for vLLM by @Datta0 in #3715
Fix prompt token count and provide completion usage in OpenAI response by @sivanantha321 in #3712

⬆️ Version Upgrade

Upgrade orjson to version 3.9.15 by @spolti in #3488
feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
Update cert manager version in quick install script by @shauryagoel in #3496
ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
upgrade knative to 1.13 by @andyi2it in #3457
Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
upgrade vllm/transformers version by @johnugeorge in #3671

🔨 Project SDLC

Enhance CI environment by @sivanantha321 in #3440
Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
chore: Update list of reviewers by @ckadner in #3484
build: Add helm docs update to make generate command by @terrytangyuan in #3437
Added v2 infer test for supported model frameworks. by @andyi2it in #3349
fix the quote format same with others and docstrings by @leyao-daily in #3490
remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
Remove GOARCH by @mkumatag in #3523
GH Alert: Potential file inclusion via variable by @spolti in #3520
Update codeQL to v3 by @spolti in #3548
switch e2e test inference graph to raw mode by @andyi2it in #3511
Black lint by @cmaddalozzo in #3568
Fix python linter by @sivanantha321 in #3571
build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
Allow rerunning failed workflows by comment by @andyi2it in #3550
add re-run info in the PR templates by @spolti in #3633
Add e2e tests for huggingface by @sivanantha321 in #3600
Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
workflow file for cherry-pick on comment by @andyi2it in #3653
Fix: huggingface runtime in helm chart by @yuzisun in #3679
Copy generated CRDs by kustomize to Helm by @Jooho in #3392
...

Contributors

alpe, rhuss, and 27 other contributors

Assets 8

21 May 09:58

yuzisun

v0.13.0-rc1

6c37dce

v0.13.0-rc1 Pre-release

Pre-release

What's Changed

upgrade vllm/transformers version by @johnugeorge in #3671
Add openai models endpoint by @cmaddalozzo in #3666
feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
Enable dtype support for huggingface server by @Datta0 in #3613
Add method for checking model health/readiness by @cmaddalozzo in #3673
fix for extract zip from gcs by @andyi2it in #3510
Update Dockerfile and Readme by @gavrishp in #3676
Update huggingface readme by @alexagriffith in #3678
fix: HPA equality check should include annotations by @terrytangyuan in #3650
Fix: huggingface runtime in helm chart by @yuzisun in #3679
Fix: model id and model dir check order by @yuzisun in #3680
Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
Unify the log configuration using kserve logger by @sivanantha321 in #3577
Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700
Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705

New Contributors

@Datta0 made their first contribution in #3613

Full Changelog: v0.13.0-rc0...v0.13.0-rc1

Contributors

cmaddalozzo, houshengbo, and 8 other contributors

Assets 7

07 May 10:11

yuzisun

v0.13.0-rc0

bfc2e21

v0.13.0-rc0 Pre-release

Pre-release

🌈 What's New?

add support for async streaming in predict by @alexagriffith in #3475
Fix: Support model parallelism in HF transformer by @gavrishp in #3459
Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
OpenAI schema by @tessapham in #3477
Support OpenAIModel in ModelRepository by @grandbora in #3590
updated xgboost to support json and ubj models by @andyi2it in #3551
Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
Add a user friendly error message for http exceptions by @grandbora in #3581
feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
Enabled the multiple domains support on an inference service by @houshengbo in #3615
Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
Add headers to predictor exception logging by @grandbora in #3658
Enhance controller setup based on available CRDs by @israel-hdez in #3472

⚠️ What's Changed

Remove conversion webhook from manifests by @Jooho in #3476
Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
docs: Move Alibi explainer to docs by @terrytangyuan in #3579
Remove generate endpoints by @cmaddalozzo in #3654

🐛 What's Fixed

Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
Make the modelcar injection idempotent by @rhuss in #3517
Only pad left for decode-only architecture models. by @sivanantha321 in #3534
fix lint typo on Makefile by @spolti in #3569
fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
Fix model unload in server stop method by @sivanantha321 in #3587
Fix golint errors by @andyi2it in #3552
Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
Fix Pydantic 2 warnings by @cmaddalozzo in #3622
build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660

⬆️ Version Upgrade

Upgrade orjson to version 3.9.15 by @spolti in #3488
feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
Update cert manager version in quick install script by @shauryagoel in #3496
ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
upgrade knative to 1.13 by @andyi2it in #3457
Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642

🔨 Project SDLC

Enhance CI environment by @sivanantha321 in #3440
Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
chore: Update list of reviewers by @ckadner in #3484
build: Add helm docs update to make generate command by @terrytangyuan in #3437
Added v2 infer test for supported model frameworks. by @andyi2it in #3349
fix the quote format same with others and docstrings by @leyao-daily in #3490
remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
Remove GOARCH by @mkumatag in #3523
GH Alert: Potential file inclusion via variable by @spolti in #3520
Update codeQL to v3 by @spolti in #3548
switch e2e test inference graph to raw mode by @andyi2it in #3511
Black lint by @cmaddalozzo in #3568
Fix python linter by @sivanantha321 in #3571
build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
Allow rerunning failed workflows by comment by @andyi2it in #3550
add re-run info in the PR templates by @spolti in #3633
Add e2e tests for huggingface by @sivanantha321 in #3600
Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
workflow file for cherry-pick on comment by @andyi2it in #3653

CVE patches

CVE-2024-24762 - update fastapi to 0.109.1 by @spolti in #3556
golang.org/x/net Allocation of Resources Without Limits or Throttling by @spolti in #3596
Fix CVE-2023-45288 for qpext by @sivanantha321 in #3618
Security fix - CVE 2024 24786 by @andyi2it in #3585

📝 Documentation Update

qpext: fix a typo in qpext doc by @daixiang0 in #3491
Update KServe project description by @yuzisun in #3524
Update kserve cake diagram by @yuzisun in #3530
Remove white background for the kserve diagram by @yuzisun in #3531
fix a typo in OPENSHIFT_GUIDE.md by @marek-veber in #3544
Fix typo in README.md by @terrytangyuan in #3575

New Contributors

@leyao-daily made their first contribution in #3490
@peterj made their first contribution in #3493
@timothyjlaurent made their first contribution in #3374
@shauryagoel made their first contribution in #3496
@mkumatag made their first contribution in #3523
@marek-veber made their first contribution in #3544
@trojaond made their first contribution in #3481
@grandbora made their first contribution in #3590
@saileshd1402 made their first contribution in #3657

Full Changelog: v0.12.1...v0.13.0-rc0

Contributors

rhuss, cmaddalozzo, and 24 other contributors

Assets 8

23 Apr 12:20

yuzisun

v0.12.1

d94ca25

v0.12.1

What's Changed

[release-0.12] Update fastapi to 0.109.1 and Support ray 2.10 by @sivanantha321 in #3609
[release-0.12] Pydantic 2 support by @cmaddalozzo in #3614
[release-0.12] Make the modelcar injection idempotent by @sivanantha321 in #3612
Prepare for release 0.12.1 by @sivanantha321 in #3610
release-0.12 pin back ray to 2.10 by @yuzisun in #3616
[release-0.12] Fix docker build failure for ARM64 by @sivanantha321 in #3627

Full Changelog: v0.12.0...v0.12.1

Contributors

cmaddalozzo, yuzisun, and sivanantha321

Assets 7

25 Feb 17:17

yuzisun

v0.12.0

c9570d6

v0.12.0

🌈 What's New?

Core Inference & Serving Runtimes

Implement HuggingFace model server by @yuzisun in #3334
feat: Add HuggingFace runtime out-of-the-box support by @terrytangyuan in #3395
Implement support for vllm as alternative backend by @gavrishp in #3415
Torchserve grpc v2 by @andyi2it in #3247
feat: CA bundle mount options for storage initializer by @Jooho in #3250
Add support for modelcars by @rhuss in #3110
Add compatibility for Istio CNI plugin by @israel-hdez in #3316
feat: Allow to disable ingress creation for raw deployment mode by @terrytangyuan in #3436

Advanced Inference

RawDeployment support for Inference Graph by @bmopuri in #3199, @bmopuri in #3194
Added custom request timeout for inferencegraph. by @andyi2it in #3173
Add regex support for propagating IG headers by @sivanantha321 in #3178

KServe Python SDK, Storage

Unpack archive files for hdfs by @sivanantha321 in #3093
feat: Support S3 transfer acceleration by @terrytangyuan in #3305

⚠️ What's Changed

Change the default value for enableDirectPvcVolumeMount to true by @Jooho in #3371
Add model arguments to API and update BERT inference example by @yuzisun in #3332

--model_name, --predictor_host, --predictor_use_ssl, --predictor_request_timeout_seconds are added to the kserve model server and no longer need to be defined in the custom predictor or transformer. --protocol is deprecated and superceded by --predictor_protocol. More details can be found on API reference doc.

🐛 What's Fixed

Removing update op from pod-mutator webhook by @rachitchauhan43 in #3163
Fix quick install script by @dtrifiro in #3164
Fix self-signed-ca installation by @sivanantha321 in #3165
Add S3_VERIFY_SSL to storage.py for S3 by @Jooho in #3172
Fix runtime not found for triton due to wrong default protocolVersion by @sivanantha321 in #3177
Make ModelServer to stop correctly when using more than 1 worker by @andyi2it in #3174
Fix serving runtime webhook cert namespace for kubeflow installation by @sivanantha321 in #3188
Fix knative config-defaults values overrided by kserve by @sivanantha321 in #3130
Fix qpext metrics port by @yuzisun in #3209
Added async with postprocess method. by @andyi2it in #3204
Fix lightgbm model input conversion when input is list of lists by @sivanantha321 in #3226
Validation added for ensuring same model format has same priority for runtime by @andyi2it in #3181
Fix: Unexpected Panic in Inference graph when it fails to create http request by @HAO2167 in #3079
Support verify variable with storage-config json style (fix-3263) by @Jooho in #3267
s3 storage initializer: only set environment variables if variables are set in storage secret json by @dtrifiro in #3259
Fix tensorflow e2e test fails due to OOM error by @sivanantha321 in #3293
fix: Properly handle the creation and closure of success file in DownloadModel() by @terrytangyuan in #3295
fix: Surface errors when writing graphHandler response by @terrytangyuan in #3308
Fix qpext hangs during shutdown by @sivanantha321 in #3268
fix: Check if HPA has the same scaleTargetRef and behavior by @terrytangyuan in #3294
Updated quick_install script to temporarily fix 0.11.2 release install by @andyi2it in #3311
image_patch_dev.sh: set pipefail by @dtrifiro in #3274
Move pmml worker validation to runtime by @sivanantha321 in #3182
Introduce retry on resource conflict by @sivanantha321 in #3240
Fix inference request fails when sending with less number of features than the total model features on lightgbm by @sivanantha321 in #3313
Fix raw deployment service points to predictor container port instead of transformer container port in transformer collocation by @sivanantha321 in #3318
Restrict storage uri to predictor only in collocation of transformer and predictor by @sivanantha321 in #3280
feat: Expose defaults for several batcher handler parameters by @terrytangyuan in #3301
fix: Properly close resources and handle errors in agent and storage. Fixes #3323 by @terrytangyuan in #3321
Handles s3 download for object name starts with folder name. by @andyi2it in #3205
chore: Remove unused timeout annotation and flag in batcher by @terrytangyuan in #3341
Pass missing infer parameters during conversion by @sivanantha321 in #3368
Add exception handler for model server and Add ability to specify custom handler by @sivanantha321 in #3405
fix: Add missing volume mount to transformer container when using modelcars by @rhuss in #3384
fix: Add 'model_version' to InferResponse in python library by @ajstewart in #3466
Fix v2 model ready url in kserve client by @sivanantha321 in #3403
Fix parameters value type conversion by pydantic by @sivanantha321 in #3430
Fix Raw Logger E2E by @israel-hdez in #3434
Expose qpext aggregate metrics port on container by @sivanantha321 in #3291
Fix dup metrics aggr port by @yuzisun in #3447
fix: HuggingFace predictor should not be recognized as multi-model server by @terrytangyuan in #3449
Fix: bugs for huggingface runtime template by @yuzisun in #3448
Fix: Add padding and truncation in huggingface tokenizer by @kevinmingtarja in #3450
Fix: vllm backend does not work with model_dir for huggingface runtime by @yuzisun in #3456
Fix azure workload identity federation by excluding azure client secret by @robbertvdg in #3390
Change certificate to ca_bundle in json style of s3 storageSecret by @Jooho in #3463

⬆️ Version Upgrade

Upgrade istio Api and migrate to v1beta1 Api version by @sivanantha321 in #3150
Bump torchserve version to 0.9.0 by @gavrishp in #3217
Allow ray >=2.7,<3 by @ddelange in #3075
Bump istio version to 1.19.4 by @sivanantha321 in #3258
Updated ray to 2.8.0 and removed detached flag to avoid deprecation error in future by @andyi2it in #3272
chore: Upgrade to XGBoost v2.0.2. Fixes #3310 by @terrytangyuan in #3309
chore: Upgrade Go to v1.21 by @terrytangyuan in #3296
Added 3.11 support for paddle in workflow. by @andyi2it in #3246
Upgraded poetry version to 1.7.1 by @andyi2it in #3271
Upgrade cloudevent to v2 by @homily707 in #3255
Update knative-serving by @spolti in #3362
Update google-cloud-storage dependecy to >=2.3.0,<3.0.0 and ray dependency to >=2.8.1, <3.0.0 by @sivanantha321 in #3389

🔨 Project SDLC

chore: Add design doc template links to feature request template by @ckadner in #3155
Make storage initializer image configurable by @yuzisun in #3145
Increase pytest workers for kourier e2e test by @sivanantha321 in #3151
Restrict workflow concurrency by @vignesh-murugani2i in #3167
Generate client-go for StorageContainer CR by @sivanantha321 in #3152
Refractor v1 vs. v2 endpoint unit tests in kserve/test/test_server.py… by @guohaoyu110 in #3158
Verify codegen in CI by @sivanantha321 in...

Contributors

rhuss, spolti, and 22 other contributors

Assets 7

27 Jan 14:10

yuzisun

v0.12.0-rc1

6fee880

v0.12.0-rc1 Pre-release

Pre-release

What's Changed

docs: Corrections and edits on release process document by @terrytangyuan in #3326
build: Switch to use kustomize in kubectl to simplify build process. Fixes #3314 by @terrytangyuan in #3315
feat: Expose defaults for several batcher handler parameters by @terrytangyuan in #3301
fix: Properly close resources and handle errors in agent and storage. Fixes #3323 by @terrytangyuan in #3321
Add model arguments to API and update BERT inference example by @yuzisun in #3332
chore: Update generated APIs and check generated manifests by @terrytangyuan in #3335
Update python model serving runtime API docstring by @yuzisun in #3338
Handles s3 download for object name starts with folder name. by @andyi2it in #3205
chore: Remove unused timeout annotation and flag in batcher by @terrytangyuan in #3341
ci: Automate release process by @terrytangyuan in #3345
fixes critical vulnerabilities on ray by @spolti in #3285
chore: Bump versions to prepare v0.12.0-rc1 release by @terrytangyuan in #3352
Change version for helm charts in README by @gawsoftpl in #3353
Fixes CVE-2023-48795 by @spolti in #3354
Fix Stack-based Buffer Overflow on protobuf by @spolti in #3358
Update knative-serving by @spolti in #3362
Fixes vulnerabilities on the otelhttp dependency by @spolti in #3361
Change the default value for enableDirectPvcVolumeMount to true by @Jooho in #3371
feat: Automatically generate Helm Chart docs. Fixes #3356 by @terrytangyuan in #3363
Modified script for include all kserve poetry projects. by @andyi2it in #3350
RawDeployment support for Inference Graph by @bmopuri in #3199
Add compatibility for Istio CNI plugin by @israel-hdez in #3316
Pass missing infer parameters during conversion by @sivanantha321 in #3368
feat: Support S3 transfer acceleration by @terrytangyuan in #3305
Implement HuggingFace model server by @yuzisun in #3334
fix: Add missing volume mount to transformer container when using modelcars by @rhuss in #3384
align cloudevents/sdk-go dependency by @spolti in #3387

New Contributors

@gawsoftpl made their first contribution in #3353

Full Changelog: v0.12.0-rc0...v0.12.0-rc1

Contributors

rhuss, spolti, and 8 other contributors

Assets 7

Releases: kserve/kserve

v0.14.0

What's Changed

Contributors

v0.14.0-rc1

What's Changed

New Contributors

Contributors

v0.14.0-rc0

What's Changed

New Contributors

Contributors

v0.13.1

What's Changed

Contributors

v0.13.0

🌈 What's New?

⚠️ What's Changed

🐛 What's Fixed

⬆️ Version Upgrade

🔨 Project SDLC

Contributors

v0.13.0-rc1

What's Changed

New Contributors

Contributors

v0.13.0-rc0

🌈 What's New?

⚠️ What's Changed

🐛 What's Fixed

⬆️ Version Upgrade

🔨 Project SDLC

CVE patches

📝 Documentation Update

New Contributors

Contributors

v0.12.1

What's Changed

Contributors

v0.12.0

🌈 What's New?

Core Inference & Serving Runtimes

Advanced Inference

KServe Python SDK, Storage

⚠️ What's Changed

🐛 What's Fixed

⬆️ Version Upgrade

🔨 Project SDLC

Contributors

v0.12.0-rc1

What's Changed

New Contributors

Contributors