-
Notifications
You must be signed in to change notification settings - Fork 962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add nodegroup plugin #3132
add nodegroup plugin #3132
Conversation
Welcome @wuyueandrew! |
5651c3f
to
366e9cf
Compare
/assign @Thor-wl |
Thanks for your contribution and testing. We will review it as soon as possible. |
lint err fixed plz restart ci pipeline |
Thanks for your contribution! Please add a user-guide for this plugin refer to #3149. |
|
Hi, please take a look at review and squash your commits : ) |
@@ -0,0 +1,229 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the copyright to be the right one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -0,0 +1,229 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change 2019 to 2023.
// - name: predicates | ||
// - name: proportion | ||
// - name: nodegroup | ||
// enablePredicate: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This arg enablePredicate
seems not used but exists in example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will remove this argument
queueGroupAffinityPreferred: make(map[string][]string, 0), | ||
} | ||
} | ||
func (q queueGroupAffinity) predicate(queue, group string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If node has no nodeGroupLabel, predicate will not fit, is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider of our scenery, nodeGroupLabel is strict required. if there is a node without nodeGroupLabel, pod cannot assigned to this node.
} | ||
if q.queueGroupAffinityPreferred != nil { | ||
if groups, ok := q.queueGroupAffinityPreferred[queue]; ok { | ||
if !contains(groups, group) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should be contains
.
if q.queueGroupAffinityPreferred != nil { | ||
if groups, ok := q.queueGroupAffinityPreferred[queue]; ok { | ||
if !contains(groups, group) { | ||
return 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AntiAffinity and Affinity should be both considered when scoring, and compute a weighted value finally.
return &nodeGroupPlugin{pluginArguments: arguments} | ||
} | ||
|
||
func (pp *nodeGroupPlugin) Name() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better rename this to np
or something to be consistent with other receiver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np
is better
"volcano.sh/volcano/pkg/scheduler/util" | ||
) | ||
|
||
func TestScore(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ut tests score
function, but it seems only predicate
is tested because there is only RequiredDuringSchedulingIgnoredDuringExecution
item in queue.spec. Please test both score
and predict
function.
@wuyueandrew Please handle the confliction in the PR and the ci failure:) |
f4681c3
to
f161062
Compare
b8c5755
to
7635571
Compare
5b5c450
to
51320d2
Compare
queueGroupAffinity := NewQueueGroupAffinity() | ||
for _, queue := range ssn.Queues { | ||
affinity := queue.Queue.Spec.Affinity | ||
if nil == affinity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To uniform code style, please move nil
to front.
} | ||
|
||
type queueGroupAffinity struct { | ||
queueGroupAntiAffinityRequired map[string][]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value is a array type is not so good, if user mis-config a lots of duplicated label, it's unacceptable, and sets is better, and it can de-duplicate and is more efficient.
if q.queueGroupAffinityRequired != nil { | ||
if groups, ok := q.queueGroupAffinityRequired[queue]; ok { | ||
if contains(groups, group) { | ||
flag = flag || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
||
is necessary here, and preferred
items should not be considered in predict.
} | ||
} | ||
|
||
if q.queueGroupAntiAffinityRequired != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AntiAffinityRequired
should be filtered in predict, so it should not exists in score
if q.queueGroupAntiAffinityPreferred != nil { | ||
if groups, ok := q.queueGroupAntiAffinityPreferred[queue]; ok { | ||
if contains(groups, group) { | ||
nodeScore = 0.25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value should be -1
|
||
func NewQueueGroupAffinity() queueGroupAffinity { | ||
return queueGroupAffinity{ | ||
queueGroupAntiAffinityRequired: make(map[string]sets.Set[string], 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 can be omitted.
if q.queueGroupAffinityPreferred != nil { | ||
if groups, ok := q.queueGroupAffinityPreferred[queue]; ok { | ||
if groups.Has(group) { | ||
nodeScore = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set a base constant value 100.
} | ||
} | ||
if q.queueGroupAffinityPreferred != nil { | ||
if groups, ok := q.queueGroupAffinityPreferred[queue]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer node get higher score than required?
090bbc3
to
18123be
Compare
docs/design/node-group.md
Outdated
@@ -39,7 +39,7 @@ case2: recommend queue can use private cloud nodes or public cloud nodes, but tt | |||
|
|||
affinity configure: | |||
1. affinity.nodeGroupAffinity.requiredDuringSchedulingIgnoredDuringExecution, hard constraints, such as `nlp = nodegroup1,nodegroup2`, it means that task in queue=nlp can ony run on the nodes in nodegroup1 or nodegroup2. | |||
2. affinity.nodeGroupAffinity.preferredDuringSchedulingIgnoredDuringExecution, soft constraints, such as `nlp = nodegroup1`, it means that task in queue=nlp runs on nodegroup1 first, but if the resources of nodegroup1 is insufficient, it can also run on other nodegroups. | |||
2. affinity.nodeGroupAffinity.preferredDuringSchedulingIgnoredDuringExecution, soft constraints, such as `nlp = nodegroup1`, it means that task in queue=nlp runs on other nodegroups first, but if the resources is insufficient, it can also run on nodegroup1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other nodegroups is misunderstanding, other nodesgroups specified in affinity.nodeGroupAffinity.requiredDuringSchedulingIgnoredDuringExecution is better.
|
||
## Introduction | ||
|
||
**Nodegroup plugin** is designed to isolate resources by assigning labels to nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also add: and set node label affinty on Queue.
preferredDuringSchedulingIgnoredDuringExecution: | ||
- groupname3 | ||
``` | ||
This implies that tasks in the "default" queue will be executed on "groupname1" and "groupname2", with a preference for "groupname1" to run first. Tasks are restricted from running on "groupname3" and "groupname4". However, if the resources in other node groups are insufficient, the task can run on "nodegroup3". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executed -> scheduled
|
||
The nodegroup design document provides the most detailed information about the node group. There are some tips to help avoid certain issues. | ||
|
||
1. Soft constraints are a subset of hard constraints, including both affinity and anti-affinity. Consider a queue setup as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following steps are both about queue's configuration, should also add other steps, like set node lable, and submit jobs to a queue and verfiy that jobs are scheduled to affinity nodes and schedule failed on antiaffinity nodes.
} | ||
|
||
status, _ := ssn.PredicateFn(task, node) | ||
// if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented codes.
"volcano.sh/volcano/pkg/scheduler/util" | ||
) | ||
|
||
func TestNormal(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace it with a more descriptive name.
expectedStatus map[string]map[string]int | ||
}{ | ||
{ | ||
name: "normal queue test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above.
}, | ||
}, | ||
{ | ||
name: "special queue job", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above.
988b0c5
to
a1e11b4
Compare
|
||
### validate queue affinity and antiAffinity rules is effected | ||
|
||
Query pod information and verify whether the pod has been assigned to the correct node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please illustrate which node is correct and which node is filtered.
3c19f47
to
cbaf82d
Compare
group := node.Node.Labels[NodeGroupNameKey] | ||
queue := GetPodQueue(task) | ||
score := queueGroupAffinity.score(queue, group) | ||
klog.V(3).Infof("task <%s>/<%s> queue %s on node %s of nodegroup %s, score %v", task.Namespace, task.Name, queue, node.Name, group, score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.v(4)
ec37291
to
89c1cbb
Compare
Signed-off-by: wuyue <[email protected]>
Signed-off-by: wuyue <[email protected]>
how to use nodegroup doc how to use nodegroup doc Signed-off-by: wuyue <[email protected]> how to use nodegroup doc how to use nodegroup doc Signed-off-by: wuyue <[email protected]> how to use nodegroup doc Signed-off-by: wuyue <[email protected]> how to use nodegroup doc Signed-off-by: wuyue <[email protected]>
89c1cbb
to
a0f257e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: william-wang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
#3131 add nodegroup plugin with ut