Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Investigate integration of KubeRay and Volcano, add to docs #697

Closed
1 of 2 tasks
DmitriGekhtman opened this issue Nov 7, 2022 · 6 comments · Fixed by #755
Closed
1 of 2 tasks

[Feature] Investigate integration of KubeRay and Volcano, add to docs #697

DmitriGekhtman opened this issue Nov 7, 2022 · 6 comments · Fixed by #755
Labels
docs Improvements or additions to documentation enhancement New feature or request P1 Issue that should be fixed within a few weeks

Comments

@DmitriGekhtman
Copy link
Collaborator

DmitriGekhtman commented Nov 7, 2022

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Investigate use of Volcano with KubeRay for gang scheduling.
Write up a user guide.

cc @kevin85421 @sihanwang41

Use case

Gang scheduling of KubeRay resources.

Related issues

This is a more specific version of #213

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@DmitriGekhtman DmitriGekhtman added docs Improvements or additions to documentation enhancement New feature or request P1 Issue that should be fixed within a few weeks labels Nov 7, 2022
@DmitriGekhtman DmitriGekhtman added this to the v0.5.0 release milestone Nov 7, 2022
@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 8, 2022

Is this one same as #213?

@DmitriGekhtman
Copy link
Collaborator Author

Re-ordered the text of the issue description to make it clearer:
I think it would be interesting specifically to target Volcano, since it's a popular tool.

Hopefully, KubeRay already exposes the necessary configuration and no changes are required.

Opened this issue on the basis of some discussion with @kevin85421.

@tgaddair
Copy link
Contributor

Can look at how the Spark Operator handles integration as a reference:

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/pkg/batchscheduler/volcano/volcano_scheduler.go

It's not yet clear to me if this requires modifying KubeRay to work, or if native integration just saves some boilerplate.

@DmitriGekhtman
Copy link
Collaborator Author

Well-documented boilerplate is the preferable route when possible.

IMO, KubeRay shouldn't be too opinionated about external tools.

@tgaddair
Copy link
Contributor

From what I can gather, the only real requirement is to create a PodGroup for the Ray Cluster and tie it to the RayCluster CR via the ownerReferences metadata. Assuming this doesn't require some knowledge of RayClusters on the Volcano side, I think it should be doable to just have users create PodGroups manually.

@tgaddair
Copy link
Contributor

The only other element to this is handling updates, like how the autoscaler can adjust the number of replicas. This could also benefit from native integration in KubeRay, as otherwise it could be difficult to intercept these changes to update the PodGroup. But that could be a "buyer beware" thing to put into the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation enhancement New feature or request P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants