-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Stateful JobSet #572
Comments
cc: @ahg-g @andreyvelich |
/kind feature |
Is this shared volume containing the pretrained model not possible to do using a regular PV, and specifying a PVC in the JobTemplate? Or is the purpose of this just to provide some automation and lifecycle management for the PV so the user doesn't need to specify the PV manifest separately? |
That's right, we want to manage lifecycle of the storage on the controller side, not on the client side. |
I like this feature, we should track it under 0.7 release. |
@tenzen-y I'm tentatively labeling the issue to be marked as part of the v0.7.0 release, under the assumption you plan to work on this - let me know if that's not the case. |
@danielvegamyhre I can try to take this feature in the mid Q4 - Q1. |
Sounds good, we have release cycle of roughly every 3 months, with 0.6 planned to release any day now. So 0.7 will be around October 1st, and 0.8 we can plan for around January 1st. For now I've removed the 0.7 label for this issue and we can tentatively plan on including it in 0.8, I'll follow up on this once we get closer to that time of year. |
Sorry for the delay. Maybe, from kubeflow v2 perspective, we need to order the priority for the additional JobSet features. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
What would you like to be added:
I would like to support features to create a single PVC and mount the PV to some replicatedJobs like this:
In this example, JobSet creates a PVC, "pretrained-model" and then the created PV is mounted to replicatedJobs specified in the
.spec.volumePolicy.replicatedJobs
This feature is similar to kubernetes/kubernetes#115066
Why is this needed:
In large distributed training, we often store the base model, and then we want to share the pre-trained model with all workers so that we can avoid downloading the pre-trained model many times.
The text was updated successfully, but these errors were encountered: