Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Volcano for batch scheduling #755

Merged
merged 30 commits into from
Dec 1, 2022
Merged

Conversation

tgaddair
Copy link
Contributor

@tgaddair tgaddair commented Nov 22, 2022

This PR is based on the branch created by @loleek and the Spark Operator integration with Volcano.

Why are these changes needed?

To enable advanced multi-tenancy features for running multiple Ray cluster in a shared environment.

Related issue number

Closes #697

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421
Copy link
Member

Thank @tgaddair for this contribution!

In my opinion, KubeRay shouldn't be too opinionated about external tools. Maybe the better solution is to make sure the exposed interfaces are enough for users to configure which tools they want to use. cc @DmitriGekhtman @Jeffwan

@DmitriGekhtman
Copy link
Collaborator

Thank @tgaddair for this contribution!

In my opinion, KubeRay shouldn't be too opinionated about external tools. Maybe the better solution is to make sure the exposed interfaces are enough for users to configure which tools they want to use. cc @DmitriGekhtman @Jeffwan

I generally agree.

Counterarguments in this case:

  • From what I can tell, Volcano is pretty well established at this point as the go-to thing for batch scheduling. (The spark operator has built-in support.)
  • The change is not too heavy-weight and is guarded by a flag. (So users still have to opt-in.)
  • This might be the most minimal change for natural support.

make sure the exposed interfaces are enough for users to configure which tools they want to use

@kevin85421 do you have an idea for a less invasive interface change that could work here?

@DmitriGekhtman
Copy link
Collaborator

It does appear that the same functionality could be achieved with some boilerplate -- namely, adding the relevant annotations, configuring the podgroup yourself, and kubectl-applying a yaml file structured as

podgroup
---
raycluster

@kevin85421
Copy link
Member

  • From what I can tell, Volcano is pretty well established at this point as the go-to thing for batch scheduling. (The spark operator has built-in support.)

I know it is a well-established tool, but it also increases the complexity of the KubeRay operator. We need to write down the comparison between different solutions for Volcano integration and choose the best one.

In addition, the Spark operator is weird for Spark folks (at least for the part of Spark). For example, it will create a driver for each Spark job (In most production cases, a cluster will only have 1 driver). Hence, the spark operator is an unusual operator for me, so I am not sure if it is a good example or not.

  • The change is not too heavy-weight and is guarded by a flag. (So users still have to opt-in.)

Currently, it is not heavy-weight. However, if we integrate a lot of external tools into the KubeRay operator, it will be heavy.

  • This might be the most minimal change for natural support.
  • @kevin85421 do you have an idea for a less invasive interface change that could work here?

No, I have no context about Volcano. That's why I want to see the comparison between different solutions.

I did not have a strong opinion about how to integrate Volcano, but I hope to see the comparison so that we can choose the best one.

@DmitriGekhtman
Copy link
Collaborator

DmitriGekhtman commented Nov 22, 2022

These are great points!

it also increases the complexity of the KubeRay operator

The operator does have enough issues as it is :)

Yeah, if it's possible for us to document how to do it oneself that would be great.

I imagine a developer building infrastructure on top of KubeRay could write their own operator to manage a FancyProprietaryRayCluster custom resource. This operator could create a RayCluster and some extra stuff (ingresses, volcano podgroups, whatever) for each FancyProprietaryRayCluster.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 23, 2022

I will help review this as well. Please wait for my comments.

@DmitriGekhtman
Copy link
Collaborator

DmitriGekhtman commented Nov 23, 2022

As @tgaddair explained yesterday, the reason "native" integration is required is that Volcano needs creation of a PodGroup with an owner reference to the RayCluster CR before the RayCluster's pods are reconciled.

I think all of this is reasonable. We should probably take some steps to mitigate the risk from increased code complexity.
Some things that would need to be done to merge this:

  1. Use an annotation to toggle batch scheduling support, rather than an operator flag.
  2. Express this feature in a modular way that would allow plugging in other batch schedulers.
  3. Write some CI tests.
  4. Do some manual testing and record the testing steps in this PR's comment thread.
  5. Document this support in the "features" section of the docs. In that section, identify yourself as the maintainer of this integration (see the note here for an example.)

Copy link
Collaborator

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the overall looks good to me. I just left some minor comments.

I know some other users uses different solutions than volcano. but sig-scheduling is not able to unify the podgroup definition and normally it's hard to support multiples ones. volcano looks good to me since it's already CNCF project now

if EnableBatchScheduler {
var minMember int32
var totalResource corev1.ResourceList
if instance.Spec.EnableInTreeAutoscaling == nil || !*instance.Spec.EnableInTreeAutoscaling {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use minMember to indicate min of gang or use a separate annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Volcano, it considers minMember to be the min gang size, which we set to be the total replica count (if not using autoscaling) or the total minReplica count (when using autoscaling).

Did you want to make this more configurable so that users can specify their own min gang size?

ray-operator/controllers/ray/utils/util.go Outdated Show resolved Hide resolved
@tgaddair tgaddair marked this pull request as ready for review November 30, 2022 01:07
@tgaddair
Copy link
Contributor Author

@Jeffwan thanks for the review! I made some updates to make the implementation more general, so we can add in additional schedulers in the future. This also makes the implementation consistent with the Spark Operator, which follows a similar convention. I'll take some time to address your comments and ensure tests are passing.

Co-authored-by: Dmitri Gekhtman <[email protected]>
Signed-off-by: Travis Addair <[email protected]>
@DmitriGekhtman
Copy link
Collaborator

DmitriGekhtman commented Nov 30, 2022

Last request from me:
For the doc page, could you document a simple example workflow that demonstrates the gang scheduling functionality?
That would help people understand the usefulness of this integration. It'd also serve as a manual (later, automated) e2e test of the integration.

@tgaddair
Copy link
Contributor Author

tgaddair commented Dec 1, 2022

Hey @DmitriGekhtman, thanks for the great feedback. I added a pretty complete end-to-end example in the docs, and verified it worked as written. Let me know what you think!

I believe this PR should be good to go at this point.

@DmitriGekhtman
Copy link
Collaborator

Looks great, one last suggestion concerning memory allocation for the Ray head pod.

@tgaddair
Copy link
Contributor Author

tgaddair commented Dec 1, 2022

Thanks @DmitriGekhtman, updated the example to use 2Gi of RAM for the head node.

Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the excellent contribution.
This comes just in time for the KubeRay 0.4.0 branch cut!

@DmitriGekhtman DmitriGekhtman merged commit d6aef8b into master Dec 1, 2022
metadata:
name: kuberay-test-queue
spec:
weight: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain weight in the doc?

name: kuberay-test-queue
spec:
weight: 1
capability:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible we to update the queue capability when there is new pod group waiting for being scheduled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will update docs to clarify this.

"sigs.k8s.io/controller-runtime/pkg/builder"
)

type BatchScheduler interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments to explain the interface and all interface's functions.

AddMetadataToPod(app *rayiov1alpha1.RayCluster, pod *v1.Pod)
}

type BatchSchedulerFactory interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@sihanwang41
Copy link
Contributor

oops, looks like I leave comments late. @tgaddair feel free to have follow up pr to address, Thank you for the contribution!

@DmitriGekhtman
Copy link
Collaborator

Sorry, @sihanwang41, I was a bit trigger happy!

@tgaddair feel free to follow-up on Sihan's documentation comments in another PR!

@tgaddair tgaddair deleted the volcano-int branch December 1, 2022 19:57
@tgaddair
Copy link
Contributor Author

tgaddair commented Dec 1, 2022

Thanks @sihanwang41, will update in a follow-up that also fixes a typo in the example.

@tgaddair
Copy link
Contributor Author

tgaddair commented Dec 1, 2022

@sihanwang41 addressed comments in #776.

@kevin85421
Copy link
Member

kevin85421 commented Dec 1, 2022

The document is very detailed! Thank @tgaddair for this high-quality contribution!

I feel risky to merge this big feature into the master at the last minute of the v0.4.0 branch cut. Will we add this PR into release v0.4.0 or move it to the next release? cc @DmitriGekhtman @sihanwang41

Risk:

  • Although this feature is disabled by default, we do not have a chance to update the exposed interfaces if we release in v0.4.0.
  • We do not know the Kubernetes compatibility for Volcano. ([Feature] Support batch scheduling and queueing #213 (comment) => I am not sure about the accuracy of this comment, but we need to be serious about it.)

Any thoughts? Thanks!

Updated:

  • The exposed interface is simple (only 1 flag --enable-batch-scheduler).
  • Release managers will test Kubernetes compatibility before v0.4.0 release.

It is okay to include this PR in release v0.4.0 after we test it.

@DmitriGekhtman
Copy link
Collaborator

DmitriGekhtman commented Dec 1, 2022

Thanks @kevin85421, these are reasonable concerns!
I think the key priority for the release will be testing compatibility of core, stable functionality (the operator and especially the RayCluster controller) against a range of Kubernetes versions. This integration is an alpha feature.

Also, eventually, we need automated tools to validate a Kubernetes compatibility matrix -- I think that's tracked somewhere.

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Adds Volcano integration for KubeRay.

Signed-off-by: Travis Addair <[email protected]>
Co-authored-by: dengkai02 <[email protected]>
Co-authored-by: Dmitri Gekhtman <[email protected]>

## Run Ray Cluster with Volcano scheduler

Add the `ray.io/scheduler-name: volcano` label to your RayCluster CR to submit the cluster pods to Volcano for scheduling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can kuberay support RayJob CRD integrate volcano?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Investigate integration of KubeRay and Volcano, add to docs
6 participants