-
Notifications
You must be signed in to change notification settings - Fork 264
Add e2e test for different task resource requests #548
Conversation
test/e2e/job.go
Outdated
rep = rep + 1 | ||
} | ||
|
||
replicaset := createReplicaSet(context, "rs-1", rep/2, "nginx", slot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case in my mind for this test is start RS in rep-1
, and start a new job with two tasks: task_1 require two cpu, task_2 requires 0.5 cpu; and expects task_2 can be scheduled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@k82cn Yeah, this works. Just want to confirm validation part. There're two ways to verify results and want to get advice.
-
Set PodGroup min = 1. The task_2 is scheduled and running, task_1 is still in pending. Since running task number >= podGroup.min. It meets waitPodGroupReady(). (Not sure if changing job min to 1 makes sense to gang... )
-
PodGroup min = 2, the task_2 can get resource and bind to node, task_1 now is allocated and task_2 is pending.
Deleting ReplicaSet will release resource for task_1 and task_2 to be scheduled. (Don't need code change from line 371-381)
For second option, do we need to capture the internal intermediate state? I am not sure how to get this information. For example, PodGroup just has this event message like "1/2 tasks in gang unschedulable: 0/2 nodes are available, 2 insufficient cpu" but we don't have a field to indicate tasks can be scheduled.
Another concern is using example you give, if cluster just has 2 cpus. Even RS is deleted, tasks can not be scheduled. (request 2.5 cpu in total)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking to use option 1 to verify result :)
Not sure if changing job min to 1 makes sense to gang...
1 is default value to gang
; gang
also support minMember < total
, the pod beyand minMember will be shared by DRF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Update the test case. Please have a look
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jeffwan, k82cn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add e2e test for different task resource requests
Add e2e test for different task resource requests
What this PR does / why we need it:
We talked about adding one task case for different task resource requests.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Mentioned in #540
Special notes for your reviewer:
I use milliCPU here and should be fine to cover different node groups.
Release note: