Pods are preferentially scheduled to machines that meet the current session resources #2815

wangyang0616 · 2023-04-26T04:40:37Z

Except that the binpack plugin has this problem, I understand that other algorithm plugins may encounter similar problems, such as task-topology, nodeorder, etc.

I was wondering if this generic problem could be solved by:
When allocate scores nodes, it divides nodes into two groups. One group is machines whose idle resources meet task resource requests, and the second group is futrue idle machines that meet task resource demands.

First, score the first group of machines, and if a suitable machine can be found, schedule the task to a suitable node; if the first group does not have a machine that meets the resource request, then score the second group of machines, and then select a suitable node for scheduling.

In this way, the pod can be dispatched to the machine that meets the resource requirements in the current session first, so that the pod will not be pending for a long time. If all the machines in the current session do not meet the requirements, it can also be scheduled to wait in the machine that meets the futrue idle.

wangyang0616 · 2023-05-05T11:54:49Z

/priority important-soon

wangyang0616 · 2023-06-15T13:41:31Z

Close the current pr, and fix the preemption problem through #2916

/close

volcano-sh-bot · 2023-06-15T13:41:35Z

@wangyang0616: Closed this PR.

In response to this:

Close the current pr, and fix the preemption problem through #2916

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wangyang0616 · 2023-08-01T12:08:48Z

Close the current pr, and fix the preemption problem through #2916

/close

The current pr was closed by mistake, reopen it.
/reopen

volcano-sh-bot · 2023-08-01T12:08:51Z

@wangyang0616: Reopened this PR.

In response to this:

Close the current pr, and fix the preemption problem through #2916

/close

The current pr was closed by mistake, reopen it.
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

william-wang · 2023-08-03T03:17:52Z

pkg/scheduler/actions/allocate/allocate.go

 				}
 			}

+			var node *api.NodeInfo


no need to define a new node variable.

william-wang · 2023-08-03T03:22:41Z

pkg/scheduler/actions/allocate/allocate.go

+			if bestNode != nil {
+				node = bestNode
+			} else {
+				klog.Errorf("task %s/%s allocate failed, bestNode is nil", task.Namespace, task.Name)


The codes between 249 - 255 are redundent.

The code after line 250 uses node information for resource judgment, and it is necessary to ensure that the node pointer is not empty.

william-wang · 2023-08-03T03:29:31Z

pkg/scheduler/actions/allocate/allocate.go

+				}
+				switch {
+				case len(nodes) == 0:
+					klog.V(3).Infof("Task: %v, no matching node is found in the nodes list."+


Here log level 3 is not so good. There will be too much log.

william-wang · 2023-08-03T03:58:25Z

pkg/scheduler/actions/allocate/allocate.go

@@ -195,28 +195,64 @@ func (alloc *Action) Execute(ssn *framework.Session) {
 				break
 			}

-			var candidateNodes []*api.NodeInfo
+			// When scheduling pods, gradient scoring is performed on all nodes that are successfully filtered.


// Candidate nodes are divided into two gradients:
// - the first gradient node: a list of free nodes that satisfy the task resource request;
// - The second gradient node: the node list whose sum of node idle resources and future idle meets the task resource request;
// Score the first gradient node first. If the first gradient node meets the requirements, ignore the second gradient node list, otherwise, score the second echelon node and select the appropriate node.

…ources, and then consider machines that are satisfied with future resources Signed-off-by: wangyang <[email protected]>

william-wang

/lgtm

volcano-sh-bot · 2023-08-07T02:57:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [william-wang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot requested review from hudson741 and merryzhou April 26, 2023 04:40

volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 26, 2023

wangyang0616 changed the title ~~Prioritize scheduling to machines that are satisfied with current res…~~ Pods are preferentially scheduled to machines that meet the current session resources Apr 26, 2023

wangyang0616 force-pushed the fix_node_score branch 2 times, most recently from a15a408 to 73b0116 Compare April 27, 2023 09:46

volcano-sh-bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 5, 2023

wangyang0616 force-pushed the fix_node_score branch from 73b0116 to 2dcbdd8 Compare May 24, 2023 09:40

volcano-sh-bot closed this Jun 15, 2023

lowang-bh mentioned this pull request Aug 1, 2023

add task status named ReleasingFailed #2943

Closed

volcano-sh-bot reopened this Aug 1, 2023

wangyang0616 force-pushed the fix_node_score branch 2 times, most recently from c841e5e to 041dfeb Compare August 2, 2023 11:37

william-wang reviewed Aug 3, 2023

View reviewed changes

wangyang0616 force-pushed the fix_node_score branch from 041dfeb to 0bfd8d1 Compare August 3, 2023 06:30

Prioritize scheduling to machines that are satisfied with current res…

5e48157

…ources, and then consider machines that are satisfied with future resources Signed-off-by: wangyang <[email protected]>

wangyang0616 force-pushed the fix_node_score branch from 0bfd8d1 to 5e48157 Compare August 3, 2023 08:16

william-wang approved these changes Aug 7, 2023

View reviewed changes

volcano-sh-bot assigned william-wang Aug 7, 2023

volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 7, 2023

volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2023

volcano-sh-bot merged commit 39c6f02 into volcano-sh:master Aug 7, 2023

wangyang0616 mentioned this pull request Aug 7, 2023

[cherry-pick for release-1.8] Pods are preferentially scheduled to machines that meet the current session resources #3035

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods are preferentially scheduled to machines that meet the current session resources #2815

Pods are preferentially scheduled to machines that meet the current session resources #2815

wangyang0616 commented Apr 26, 2023 •

edited

Loading

wangyang0616 commented May 5, 2023

wangyang0616 commented Jun 15, 2023

volcano-sh-bot commented Jun 15, 2023

wangyang0616 commented Aug 1, 2023

volcano-sh-bot commented Aug 1, 2023

william-wang Aug 3, 2023

wangyang0616 Aug 3, 2023

william-wang Aug 3, 2023

wangyang0616 Aug 3, 2023

william-wang Aug 3, 2023

wangyang0616 Aug 3, 2023

william-wang Aug 3, 2023

wangyang0616 Aug 3, 2023

william-wang left a comment

volcano-sh-bot commented Aug 7, 2023

Pods are preferentially scheduled to machines that meet the current session resources #2815

Pods are preferentially scheduled to machines that meet the current session resources #2815

Conversation

wangyang0616 commented Apr 26, 2023 • edited Loading

wangyang0616 commented May 5, 2023

wangyang0616 commented Jun 15, 2023

volcano-sh-bot commented Jun 15, 2023

wangyang0616 commented Aug 1, 2023

volcano-sh-bot commented Aug 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

william-wang left a comment

Choose a reason for hiding this comment

volcano-sh-bot commented Aug 7, 2023

wangyang0616 commented Apr 26, 2023 •

edited

Loading