Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2894] Update KubeRay operator documentation for YuniKorn integration #489

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/user_guide/workloads/kuberay/_ray_operator.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
```
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.1.1
helm install kuberay-operator kuberay/kuberay-operator --version 1.2.2 --set batchScheduler.name=yunikorn
```
- The result should be as shown below
![ray_cluster_operator](../../../assets/ray_cluster_operator.png)
133 changes: 125 additions & 8 deletions docs/user_guide/workloads/run_ray_cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,23 +29,138 @@ specific language governing permissions and limitations
under the License.
-->

:::info[Note]
This example demonstrates how to set up [KubeRay](https://docs.ray.io/en/master/cluster/kubernetes/getting-started.html) and run a [RayCluster](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/raycluster-quick-start.html) with the YuniKorn scheduler. It relies on an admission controller to configure the default applicationId and queue name. If you want more details, please refer to [Yunikorn supported labels](https://yunikorn.apache.org/docs/user_guide/labels_and_annotations_in_yunikorn) and [Yunikorn queue setting](https://yunikorn.apache.org/docs/user_guide/queue_config).
:::note
This example demonstrates how to set up [KubeRay](https://docs.ray.io/en/master/cluster/kubernetes/getting-started.html) and run a [RayCluster](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/raycluster-quick-start.html) with the YuniKorn scheduler. Here're the pre-requisites:
- This tutorial assumes YuniKorn is [installed](../../get_started/get_started.md) under the namespace `yunikorn`
- Use kube-ray version >= 1.2.2 to enable support for YuniKorn gang scheduling
:::

<YunikornConfigMapPatch />
## Install YuniKorn

A simple script to install YuniKorn under the namespace `yunikorn`, refer to [Get Started](../../get_started/get_started.md) for more details.

```shell script
helm repo add yunikorn https://apache.github.io/yunikorn-release
helm repo update
helm install yunikorn yunikorn/yunikorn --create-namespace --namespace yunikorn
```

<RayOperator/>

## Create RayCluster
## Create RayCluster with YuniKorn

In the example, we set the `ray.io/gang-scheduling-enabled` label to `true` to enable gang scheduling.

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

<Tabs>
<TabItem value="amd64" label="x86-64 (Intel/Linux)">

```yaml
cat <<EOF | kubectl apply -f -
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: test-yunikorn-0
labels:
ray.io/gang-scheduling-enabled: "true"
yunikorn.apache.org/app-id: test-yunikorn-0
yunikorn.apache.org/queue: root.default
spec:
rayVersion: "2.9.0"
headGroupSpec:
rayStartParams: {}
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "1"
memory: "2Gi"
workerGroupSpecs:
- groupName: worker
rayStartParams: {}
replicas: 2
minReplicas: 2
maxReplicas: 2
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0
resources:
limits:
cpu: "1"
memory: "1Gi"
requests:
cpu: "1"
memory: "1Gi"
EOF
```
helm install raycluster kuberay/ray-cluster --version 1.1.1

</TabItem>
<TabItem value="aarch64" label="Apple Silicon(arm64)">

```yaml
cat <<EOF | kubectl apply -f -
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: test-yunikorn-0
labels:
ray.io/gang-scheduling-enabled: "true"
yunikorn.apache.org/app-id: test-yunikorn-0
yunikorn.apache.org/queue: root.default
spec:
rayVersion: "2.9.0"
headGroupSpec:
rayStartParams: {}
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0-aarch64
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "1"
memory: "2Gi"
workerGroupSpecs:
- groupName: worker
rayStartParams: {}
replicas: 2
minReplicas: 2
maxReplicas: 2
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0-aarch64
resources:
limits:
cpu: "1"
memory: "1Gi"
requests:
cpu: "1"
memory: "1Gi"
EOF
```

</TabItem>
</Tabs>

- RayCluster result
![ray_cluster_cluster](../../assets/ray_cluster_cluster.png)
- YuniKorn UI
![ray_cluster_on_ui](../../assets/ray_cluster_on_ui.png)

<RayCRDYunikornConfig />

## Submit a RayJob to RayCluster
```
Expand All @@ -57,10 +172,12 @@ kubectl exec -it $HEAD_POD -- python -c "import ray; ray.init(); print(ray.clust

Services in Kubernetes aren't directly accessible by default. However, you can use port-forwarding to connect to them locally.
```
kubectl port-forward service/raycluster-kuberay-head-svc 8265:8265
kubectl port-forward service/test-yunikorn-0-head-svc 8265:8265
```
After port-forward set up, you can access the Ray dashboard by going to `http://localhost:8265` in your web browser.

- Ray Dashboard
![ray_cluster_ray_dashborad](../../assets/ray_cluster_ray_dashborad.png)

Have doubts? Check out the [KubeRay integration with Apache YuniKorn](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/yunikorn.html) official documents.