From 11aca592c13361369e2b59f9695446d98298946d Mon Sep 17 00:00:00 2001 From: Andrew Sy Kim Date: Tue, 5 Nov 2024 13:52:03 -0500 Subject: [PATCH] [Docs][KubeRay] Update KubeRay + Kueue guides to use newer versions of Kueue (#48564) ## Why are these changes needed? Update KubeRay + Kueue guides to use newer versions of Kueue ## Related issue number ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [X] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Andrew Sy Kim --- .../examples/rayjob-kueue-gang-scheduling.md | 16 ++++++---------- .../examples/rayjob-kueue-priority-scheduling.md | 2 +- 2 files changed, 7 insertions(+), 11 deletions(-) diff --git a/doc/source/cluster/kubernetes/examples/rayjob-kueue-gang-scheduling.md b/doc/source/cluster/kubernetes/examples/rayjob-kueue-gang-scheduling.md index 18c919bfc661..86e43a73d64f 100644 --- a/doc/source/cluster/kubernetes/examples/rayjob-kueue-gang-scheduling.md +++ b/doc/source/cluster/kubernetes/examples/rayjob-kueue-gang-scheduling.md @@ -37,16 +37,16 @@ Create a GKE cluster with the `enable-autoscaling` option: ```bash gcloud container clusters create kuberay-gpu-cluster \ --num-nodes=1 --min-nodes 0 --max-nodes 1 --enable-autoscaling \ - --zone=us-west1-b --machine-type e2-standard-4 --cluster-version 1.29 + --zone=us-east4-c --machine-type e2-standard-4 ``` Create a GPU node pool with the `enable-queued-provisioning` option enabled: ```bash -gcloud beta container node-pools create gpu-node-pool \ +gcloud container node-pools create gpu-node-pool \ --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest \ --enable-queued-provisioning \ --reservation-affinity=none \ - --zone us-west1-b \ + --zone us-east4-c \ --cluster kuberay-gpu-cluster \ --num-nodes 0 \ --min-nodes 0 \ @@ -55,14 +55,10 @@ gcloud beta container node-pools create gpu-node-pool \ --machine-type g2-standard-4 ``` -This command creates a node pool which initially has zero nodes. Use the `gcloud beta` command because some of the flags have beta status. +This command creates a node pool which initially has zero nodes. The `--enable-queued-provisioning` flag enables "queued provisioning" in the Kubernetes node autoscaler using the ProvisioningRequest API. More details are below. You need to use the `--reservation-affinity=none` flag because GKE doesn't support Node Reservations with ProvisioningRequest. -:::{note} -"enable-queued-provisioning" is only available on versions 1.28+ with the `gcloud beta` command -::: - ## Install the KubeRay operator @@ -71,9 +67,9 @@ The KubeRay operator Pod must be on the CPU node if you set up the taint for the ## Install Kueue -Install Kueue with the ProvisioningRequest API enabled. +Install the latest released version of Kueue. ``` -kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.6.0/manifests-alpha-enabled.yaml +kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.8.2/manifests.yaml ``` See [Kueue Installation](https://kueue.sigs.k8s.io/docs/installation/#install-a-released-version) for more details on installing Kueue. diff --git a/doc/source/cluster/kubernetes/examples/rayjob-kueue-priority-scheduling.md b/doc/source/cluster/kubernetes/examples/rayjob-kueue-priority-scheduling.md index d15213cbd5a6..150cb87a239f 100644 --- a/doc/source/cluster/kubernetes/examples/rayjob-kueue-priority-scheduling.md +++ b/doc/source/cluster/kubernetes/examples/rayjob-kueue-priority-scheduling.md @@ -29,7 +29,7 @@ The KubeRay operator Pod must be on the CPU node if you set up the taint for the ## Step 2: Install Kueue ```bash -VERSION=v0.6.0 +VERSION=v0.8.2 kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml ```