-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training-operator set scheduler error #1447
Comments
/assign @Jeffwan |
em. I agree the logic here is kind of confusing, however current patch solve missing problem but doesn't make sense to me. We need to define the correct behavior here. With gang enabled,
|
I think it should be: If there is empty scheduler name or null |
@Jeffwan I think it is a better idea. IsGangSchedulerSet have two logic
we can remove logic 2 if r.Config.EnableGangScheduling {
schedulerNameFromRequest = util.parseGangScheduler(replicas)
if len(schedulerNameFromRequest) > 0 {
errMsg := fme.Sprintf("Another scheduler = %s is specified when gang-scheduling is enabled and it will not be overwritten",schedulerNameFromRequest)
logger.Warning(errMsg)
r.Recorder.Event(tfjob, v1.EventTypeWarning, podTemplateSchedulerNameReason, errMsg)
} else {
podTemplate.Spec.SchedulerName = gangSchedulerName
}
} or just simply if r.Config.EnableGangScheduling {
if util.IsGangSchedulerSet(replicas) {
errMsg := "Another scheduler is specified when gang-scheduling is enabled and it will not be overwritten"
logger.Warning(errMsg)
r.Recorder.Event(tfjob, v1.EventTypeWarning, podTemplateSchedulerNameReason, errMsg)
} else {
podTemplate.Spec.SchedulerName = gangSchedulerName
} |
Personally, I prefer the latter. WDYT @Jeffwan |
Agree. The only thing I feel a little bit misleading is the method name Let's move forward with latter. |
@Jeffwan is and the code is const gangSchedulerName = "volcano"
if r.Config.EnableGangScheduling {
if util.IsNonDefaultGangSchedulerSet(replicas, gangSchedulerName){
if pod.spec.SchedulerName == nil {
podTemplate.Spec.SchedulerName = gangSchedulerName
}else{
errMsg := "Another scheduler is specified when gang-scheduling is enabled and it will not be overwritten"
logger.Warning(errMsg)
r.Recorder.Event(tfjob, v1.EventTypeWarning, podTemplateSchedulerNameReason, errMsg)
}
}
} |
The same error here. .org/v1","resourceVersion":"20490546"}, "reason": "SettedPodTemplateSchedulerName", "message": "Another scheduler is specified when gang-scheduling is enabled and it will not be overwritten"}
time="2021-11-04T06:14:09Z" level=info msg="Controller ctr-multi-train-3r1h2 created pod ctr-multi-train-3r1h2-worker-0" job=.ctr-multi-train-3r1h2 pod=.ctr-multi-train-3r1h2-worker-0 uid=
time="2021-11-04T06:14:09Z" level=info msg="Need to create new pod: worker-1" job=ai-ctr.ctr-multi-train-3r1h2 uid=aca068ca-09a3-416e-8d3b-d3f3905b1a6a
time="2021-11-04T06:14:09Z" level=warning msg="Another scheduler is specified when gang-scheduling is enabled and it will not be overwritten" job=ai-ctr.ctr-multi-train-3r1h2 replica-type=worker uid=aca068ca-09a3-416e-8d3b-d3f3905b1a6a so, does this mean the volcano is not working? |
@Jeffwan Can you please help review the PR #1448 ? @berlinsaint Personally, I think it does not work with this bug not fixed. |
@gaocegege Sounds good. I ping @qiankunli for renaming |
i add a GetSchedulerName method, it may be simple to read if r.Config.EnableGangScheduling {
podSchedulerName := util.GetSchedulerName(replicas)
if len(podSchedulerName) == 0 {
podTemplate.Spec.SchedulerName = gangSchedulerName
} else if strings.Compare(podSchedulerName, gangSchedulerName) != 0 {
errMsg := "Another scheduler is specified when gang-scheduling is enabled and it will not be overwritten"
logger.Warning(errMsg)
r.Recorder.Event(tfjob, v1.EventTypeWarning, podTemplateSchedulerNameReason, errMsg)
}
} |
@berlinsaint The patch is merged, you can use the lastest code. The bug should be fixed. |
Thanks. In fact , i have compiled them myself this afternoon.. |
Awesome. Sorry for the late fix. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
training-operator v1.3.0
if
spec.Template.Spec.SchedulerName
is null, IsGangSchedulerSet will return false, podTemplate.Spec.SchedulerName will not set gangSchedulerNamethe same code in tf-operator v1.2.1
The text was updated successfully, but these errors were encountered: