Skip to content

Latest commit

 

History

History
192 lines (167 loc) · 10.2 KB

kuberay-with-MCAD.md

File metadata and controls

192 lines (167 loc) · 10.2 KB

KubeRay integration with MCAD (Multi-Cluster-App-Dispatcher)

The multi-cluster-app-dispatcher is a Kubernetes controller providing mechanisms for applications to manage batch jobs in a single or multi-cluster environment. For more details please refer here.

Use case

MCAD allows you to deploy Ray cluster with a guarantee that sufficient resources are available in the cluster prior to actual pod creation in the Kubernetes cluster. It supports features such as:

  • Integrates with upstream Kubernetes scheduling stack for features such co-scheduling, Packing on GPU dimension etc.
  • Ability to wrap any Kubernetes objects.
  • Increases control plane stability by JIT (Just-in Time) object creation.
  • Queuing with policies.
  • Quota management that goes across namespaces.
  • Support for multiple Kubernetes clusters; dispatching jobs to any one of a number of Kubernetes clusters.

In order to queue Ray cluster(s) and gang dispatch them when aggregated resources are available please create a KinD cluster using the instruction below and then refer to the setup KubeRay-MCAD integration on a Kubernetes Cluster or an OpenShift Cluster.

On OpenShift, MCAD and KubeRay are already part of the Open Data Hub Distributed Workload Stack. The stack provides a simple, user-friendly abstraction for scaling, queuing and resource management of distributed AI/ML and Python workloads. Please follow the Quick Start in the Distributed Workloads for installation.

Create KinD cluster

We need a KinD cluster with the specified cluster resources to consistently observe the expected behavior described in the demo below. This can be done with running KinD with Podman.

Note: Without Podman, a KinD worker node is allowed to see the cpu/memory resources on the host. In addition, this environment is created to run the tutorial on a resource-constrained local Kubernetes environment. It is not recommended for real workloads or production.

podman machine init --cpus 8 --memory 8196
podman machine start
podman machine list

Expect the Podman Machine running with the follow CPU and MEMORY resources

NAME                     VM TYPE     CREATED        LAST UP            CPUS        MEMORY      DISK SIZE
podman-machine-default*  qemu        2 minutes ago  Currently running  8           8.594GB     107.4GB

Create KinD cluster on the Podman Machine:

KIND_EXPERIMENTAL_PROVIDER=podman kind create cluster

Creating a KinD cluster should take less than 1 minute. Expect the output similar to:

using podman due to KIND_EXPERIMENTAL_PROVIDER
enabling experimental podman provider
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.26.3) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋

Describe the single node cluster:

kubectl describe node kind-control-plane

Expect the cpu and memory in the Allocatable section to be similar to:

Allocatable:
  cpu:            8
  hugepages-1Gi:  0
  hugepages-2Mi:  0
  memory:         8118372Ki
  pods:           110

Submitting KubeRay cluster to MCAD

After the KinD cluster is created using the instruction above, make sure to install the KubeRay-MCAD integration Prerequisites for KinD cluster.

Let's create two RayClusters using the AppWrapper custom resource(CR) on the same Kubernetes cluster. The AppWrapper is the custom resource definition provided by MCAD to dispatch resources and manage batch jobs on Kubernetes clusters.

  • We submit the first RayCluster with the AppWrapper CR aw-raycluster.yaml:

    kubectl create -f https://raw.githubusercontent.com/project-codeflare/multi-cluster-app-dispatcher/main/doc/usage/examples/kuberay/config/aw-raycluster.yaml

    In the above AppWrapper CR, we wrapped an example of RayCluster CR in the generictemplate. We also specified matching resources for each of the RayCluster Head node and worker node in the custompodresources. The MCAD uses the custompodresources to reserve the required resources to run the RayCluster without creating pending Pods.

    Note: Within the same AppWrapper, you may also wrap any individual k8s resources (i.e. configMap, secret, etc) associated with this job as a generictemplate to be dispatched together with the RayCluster.

    Check AppWrapper status by describing the job.

    kubectl describe appwrapper raycluster-complete -n default
    

    The Status: stanza would show the State of Running if the wrapped RayCluster has been deployed. The 2 Pods associated with the RayCluster were also created.

    Status:
      Canrun:  true
      Conditions:
        Last Transition Micro Time:  2023-08-29T02:50:18.829462Z
        Last Update Micro Time:      2023-08-29T02:50:18.829462Z
        Status:                      True
        Type:                        Init
        Last Transition Micro Time:  2023-08-29T02:50:18.829496Z
        Last Update Micro Time:      2023-08-29T02:50:18.829496Z
        Reason:                      AwaitingHeadOfLine
        Status:                      True
        Type:                        Queueing
        Last Transition Micro Time:  2023-08-29T02:50:18.842010Z
        Last Update Micro Time:      2023-08-29T02:50:18.842010Z
        Reason:                      FrontOfQueue.
        Status:                      True
        Type:                        HeadOfLine
        Last Transition Micro Time:  2023-08-29T02:50:18.902379Z
        Last Update Micro Time:      2023-08-29T02:50:18.902379Z
        Reason:                      AppWrapperRunnable
        Status:                      True
        Type:                        Dispatched
      Controllerfirsttimestamp:      2023-08-29T02:50:18.829462Z
      Filterignore:                  true
      Queuejobstate:                 Dispatched
      Sender:                        before manageQueueJob - afterEtcdDispatching
      State:                         Running
    Events:                          <none>
    (base) asmalvan@mcad-dev:~/mcad-kuberay$ kubectl get pod -n default
    NAME                                           READY   STATUS    RESTARTS   AGE
    raycluster-complete-head-9s4x5                 1/1     Running   0          47s
    raycluster-complete-worker-small-group-4s6jv   1/1     Running   0          47s
    
  • Let's submit another RayCluster with the AppWrapper CR and see it queued without creating pending Pods using the command:

    kubectl create -f https://raw.githubusercontent.com/project-codeflare/multi-cluster-app-dispatcher/main/doc/usage/examples/kuberay/config/aw-raycluster-1.yaml

    Check the raycluster-complete-1 AppWrapper

    kubectl describe appwrapper raycluster-complete-1 -n default
    

    The Status: stanza should show the State of Pending if the wrapped object (RayCluster) has been queued. No pods from the second AppWrapper were created due to Insufficient resources to dispatch AppWrapper.

    Status:
      Conditions:
        Last Transition Micro Time:  2023-08-29T17:39:08.406401Z
        Last Update Micro Time:      2023-08-29T17:39:08.406401Z
        Status:                      True
        Type:                        Init
        Last Transition Micro Time:  2023-08-29T17:39:08.406452Z
        Last Update Micro Time:      2023-08-29T17:39:08.406451Z
        Reason:                      AwaitingHeadOfLine
        Status:                      True
        Type:                        Queueing
        Last Transition Micro Time:  2023-08-29T17:39:08.423208Z
        Last Update Micro Time:      2023-08-29T17:39:08.423208Z
        Reason:                      FrontOfQueue.
        Status:                      True
        Type:                        HeadOfLine
        Last Transition Micro Time:  2023-08-29T17:39:08.439753Z
        Last Update Micro Time:      2023-08-29T17:39:08.439753Z
        Message:                     Insufficient resources to dispatch AppWrapper.
        Reason:                      AppWrapperNotRunnable.
        Status:                      True
        Type:                        Backoff
      Controllerfirsttimestamp:      2023-08-29T17:39:08.406399Z
      Filterignore:                  true
      Queuejobstate:                 Backoff
      Sender:                        before ScheduleNext - setHOL
      State:                         Pending
    Events:                          <none>
    

We may manually check the allocated resources:

kubectl describe node kind-control-plane

The Allocated resources section showed cpu Requests as 6050m(75%) therefore the remaining cpu resource did not satisfy the second AppWrapper.

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests         Limits
  --------           --------         ------
  cpu                6050m (75%)      5200m (65%)
  memory             6824650Ki (84%)  6927050Ki (85%)
  ephemeral-storage  0 (0%)           0 (0%)
  hugepages-1Gi      0 (0%)           0 (0%)
  hugepages-2Mi      0 (0%)           0 (0%)

Dispatching policy out of the box is FIFO which can be augmented as per user needs. The second RayCluster will be dispatched when additional aggregated resources are available in the cluster or the first AppWrapper is deleted.

For example, observe the other RayCluster been created after deleting the first AppWrapper using:

kubectl delete appwrapper raycluster-complete -n default

Note: This would also simultaneously remove any K8s resources you may have wrapped as generictemplates within this AppWrapper.