-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: not remove podgroup uid will cause topology annotation to be useless #3711
Conversation
Welcome @JesseStutler! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some UT
we should also consider the scene when only scheduler is used, without vc-controller. |
@lowang-bh So when there is no vc-controller, podgroup may be manually created by user? Any suggestion about how to distinguish it? |
Manually created the podgroup which is assigned in pod annotations. I think we can distinguish it via weather the pg name exist in pod's annotations. |
volcano version before v1.5 use vcjob name as the pg name,we should also consider the compatibility when upgrade from old version,detail:#3652 |
@@ -247,7 +252,7 @@ func affinityCheck(job *api.JobInfo, affinity [][]string) error { | |||
|
|||
var taskNumber = len(job.Tasks) | |||
var taskRef = make(map[string]bool, taskNumber) | |||
var jobNamePrefix = job.Name + "-" | |||
var jobNamePrefix = job.Name[:len(job.Name)-uidLength] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When upgrade from old version before v1.5,old format pg name has no uid,the index can be out of range and panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Now the taskName is task.Taskrole, which is assigned by volcano.sh/task-spec
annotation, no need to remove the prefix and suffix to get the taskName now. This is compatible with upgrade issues from older volcano.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good.
5573260
to
39fbfcd
Compare
@hwdef Already add UTs, please review again, thanks~ |
@lowang-bh After discussing with @Monokaix, we agree that we should use task-spec annotation to directly get the task name from pod(the annotation is added by vc-controller), not remove prefix or suffix, otherwise removing prefix or suffix will introduce some compatibility issues. In your situation, when there is no vc-controller, users should directly add task-spec annotation by themselves. |
/lgtm |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
…less Signed-off-by: jessestutler <[email protected]>
39fbfcd
to
9a7ee2c
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: william-wang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: jessestutler <[email protected]>
/lgtm |
Background
fix #3469.
In topology plugin
affinityCheck
,jobNamePrefix
ispodGroup.Name + '-'
, the podgroup's name created by vc-controller contains uid, but created pod is{vcjob.Name}+{pod.Name}+{index}
, vcjob's name does not contain uid, so the topology annotation now is useless.volcano/pkg/scheduler/plugins/task-topology/topology.go
Lines 243 to 279 in c8eb453
volcano/pkg/controllers/job/job_controller_actions.go
Line 665 in c8eb453
volcano/pkg/controllers/job/job_controller_actions.go
Line 355 in c8eb453
Solution
use
var jobNamePrefix = job.Name[:len(job.Name)-uidLength]
instead, remove uidVerification
Create a vcjob, has 2 replicas ps-task and 1 worker-task, ps-task and worker contains affinity, but ps-task and ps-task contains anti.
The worker-task and ps-task are scheduled onto the same pod, but the other ps-task is on different node.