feat(provisioning) enable pod resheduling cause of node insufficient capacity #23

iyashu · 2021-02-17T18:03:25Z

Why is this PR required? What issue does it fix?:
As of now, if the volume provisioning fails due to insufficient capacity on nodes, the pod will remain in pending or container creating state forever (unless someone recreates it). See #18 for more details. Apart from this, the pull request also gracefully handles the rpc timeouts & internal errors ensuring idempotency in provisioning workflow.

What this PR does?:
We introduce additional field named as VolumeError in LVMVolume custom resource to propagate the error from node agent to controller. Additionally, node agent will mark the lvm volume being failed if no sufficient space is available in node vg. Controller will inspect the errors populated by node agent and according return either ResourceExhausted error code, so that external provisioner can reschedule the pod. In other cases, controller returns status Aborted to treat the volume provisioning in progress.

Note: Pull request currently doesn't handles rescheduling in case of implicit binding. It should be trivial & I'll create a separate pr for the same.

Does this PR require any upgrade changes?:
No

If the changes in this PR are manually verified, list down the scenarios covered::
Consider a cluster where only one node has sufficient capacity to fulfill the pvc capacity request. Now, if you try to create a pod (with late binding), kube-scheduler will keep retrying scheduling if node doesn't have enough capacity available in volume group until it finds the right node.

Any additional information for your reviewer? :
Mention if this PR is part of any design or a continuation of previous PRs

Checklist:

Fixes Add ability to reschedule the volume provisioning in case selected node doesn't have enough capacity #18
PR Title follows the convention of <type>(<scope>): <subject>
Has the change log section been updated?
Commit has unit tests
Commit has integration tests
(Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
(Optional) If documentation changes are required, which issue on https://github.com/openebs/openebs-docs is used to track them:

pkg/mgmt/volume/volume.go

pawanpraka1 · 2021-02-23T11:16:46Z

pkg/driver/schd_helper.go

 	"strconv"

+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+


why this change?

I'm not sure if we are using gofmt for maintaining a same code format. goimports (being superset of gofmt) separates standard lib imports with third party ones by a newline. More details are here.

pawanpraka1 · 2021-02-23T18:03:18Z

pkg/mgmt/volume/volume.go

 		}
+		return err


status is not updated here, it will block the controller forever?

Yes, it is intentional. As of now, we are setting up the failed state only in case of Insufficient capacity where we don't want to retry the volume provisioning. In other cases, controller loop will keep retrying the volume provisioning.

pawanpraka1 · 2021-02-25T14:32:43Z

pkg/driver/params.go

+
+	// parse bool params
+	boolParams := map[string]*bool{
+		"wait": &params.WaitForProvision,


do we need this parameter? we will always wait for the provisioning.

Yep, removed.

pawanpraka1 · 2021-02-25T14:40:09Z

pkg/lvm/lvm_util.go

@@ -89,6 +112,7 @@ func CreateVolume(vol *apis.LVMVolume) error {
 	out, err := cmd.CombinedOutput()

 	if err != nil {
+		err = newExecError(out, err)


should we use newExecError in the previous return at line 103?

It's not required. As the name suggests, newExecError just wraps only the error returned by exec pkg after running the cmd. Wrapping other errors will not provide any extra information since output will be empty.

pawanpraka1 · 2021-02-25T14:54:13Z

pkg/driver/controller.go

+	if volErr := vol.Status.Error; volErr != nil {
+		errMsg = volErr.Message
+		if volErr.Code == lvmapi.InsufficientCapacity {
+			reschedule = true


can we make is reschedule everytime there is failure on the node, does not matter for what reason. If it fails on a perticular node, we would want to try the volume creation on some other node.

I think it may not be a good idea to reschedule volume in case of any (possibly intermittent) error. It'll result in sparse & unpredictable scheduling of storage resources. IMO, if there is any error related to a particular node, it's better to inspect & fix that error(by cordoning the node) rather than masking it with rescheduling.

pawanpraka1 · 2021-03-01T13:35:30Z

@iyashu there is conflict in the file pkg/driver/controller.go. Cab you rebase and push.

capacity It also involves some refactoring of csi param parsing and volume provisioing workflow improving timeout, error handling & idempotency. Signed-off-by: Yashpal Choudhary <[email protected]>

iyashu · 2021-03-01T14:01:22Z

@pawanpraka1 done, ptal.

codecov-io · 2021-03-01T14:03:29Z

Codecov Report

Merging #23 (3fab614) into master (7effd08) will decrease coverage by 0.03%.
The diff coverage is 0.00%.

@@            Coverage Diff            @@
##           master     #23      +/-   ##
=========================================
- Coverage    1.20%   1.16%   -0.04%     
=========================================
  Files          11      12       +1     
  Lines         830     856      +26     
=========================================
  Hits           10      10              
- Misses        820     846      +26

Impacted Files	Coverage Δ
pkg/driver/controller.go	`0.76% <0.00%> (-0.02%)`	⬇️
pkg/driver/params.go	`0.00% <0.00%> (ø)`
pkg/driver/schd_helper.go	`0.00% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7effd08...3fab614. Read the comment docs.

Signed-off-by: Yashpal Choudhary <[email protected]>

pawanpraka1 · 2021-03-02T14:35:52Z

deploy/lvm-operator.yaml

@@ -123,14 +123,28 @@ spec:
          description: VolStatus string that specifies the current state of the volume
            provisioning request.
          properties:
+            error:


do we need this now? we can directly rely on state being failed to take the decesion?

Yeah, functionally from rescheduling point of view it can be removed now. I had just kept it for the purpose of better inspection & debugging as a user/maintainer. As we are propagating error message to csi external provisioner (along with ResourceExhausted), the same error message is being added as events under pvc resource. e.g

➜ lvm-localpv git:(storage-capacity) ✗ k describe pvc claim-a-test-a-4 Name: claim-a-test-a-4 Namespace: default StorageClass: openebs-ext4pv Status: Pending Volume: Labels: app=test-a Annotations: volume.beta.kubernetes.io/storage-provisioner: local.csi.openebs.io Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Used By: test-a-4 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForFirstConsumer 22s persistentvolume-controller waiting for first consumer to be created before binding Normal Provisioning 5s (x4 over 22s) local.csi.openebs.io_openebs-lvm-controller-0_6f7c487b-6397-4d19-a5d4-b4d84819f831 External provisioner is provisioning volume for claim "default/claim-a-test-a-4" Normal ExternalProvisioning 5s (x5 over 22s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "local.csi.openebs.io" or manually created by system administrator Warning ProvisioningFailed 3s (x4 over 19s) local.csi.openebs.io_openebs-lvm-controller-0_6f7c487b-6397-4d19-a5d4-b4d84819f831 failed to provision volume with StorageClass "openebs-ext4pv": rpc error: code = ResourceExhausted desc = Volume group "lvmvg" has insufficient free space (3071 extents): 5120 required. - exit status 5 Normal WaitForPodScheduled 3s (x5 over 19s) persistentvolume-controller waiting for pod test-a-4 to be scheduled

pawanpraka1

looks good. Can you describe how you have tested it? Can you have a setup and request for a size which only one node can serve and see if it is getting tried/provisioner on this node eventually?

pawanpraka1 · 2021-03-03T14:11:45Z

pkg/driver/controller.go

-			}
-			return status.Errorf(codes.Internal,
-				"lvm: destroy wait failed, not able to get the volume %s %s", volname, err.Error())
+	if reschedule {


we can move this block inside if volErr := vol.Status.Error; volErr != nil .

We can since reschedule var is only set to true at one place . Just keeping it here for readability purpose where we first take decision on whether we want to reschedule or not and then at last we run some clean up actions based on if rescheduling is required. I can move it if you think otherwise.

iyashu · 2021-03-03T15:39:58Z

looks good. Can you describe how you have tested it? Can you have a setup and request for a size which only one node can serve and see if it is getting tried/provisioner on this node eventually?

Exactly see the pull request description above (If the changes in this PR are manually verified, list down the scenarios covered). In my case, I've a 4 nodes cluster (besides master nodes) each having storage capacity of 32 GiB. I've created 4 pods each requesting storage capacity of size 20 GiB. Now, I can see from lvm node plugin logs that kube-scheduler first tries to schedule all 4 pods on first 3 nodes, but then eventually it schedule the 4th pod on that remaining node (having sufficient capacity since other 3 nodes have only 12 GiB left).

Another test that I've run is when there is no nodes available to satisfy the pvc claim. In that case, kube-scheduler keeps on retrying the pvc across nodes until any nodes gets capacity enough to fit the claim size.

pawanpraka1 · 2021-03-03T15:45:44Z

sounds good @iyashu .Thanks for this PR.

iyashu force-pushed the lvm-error-handling branch from 029596b to 6e6368c Compare February 17, 2021 18:07

pawanpraka1 added this to the v0.3 milestone Feb 18, 2021

pawanpraka1 added the pr/community label Feb 18, 2021

pawanpraka1 reviewed Feb 25, 2021

View reviewed changes

iyashu force-pushed the lvm-error-handling branch from 6e6368c to 30b5feb Compare February 28, 2021 15:27

feat(provisioning) enable pod resheduling cause of node insufficient

3fab614

capacity It also involves some refactoring of csi param parsing and volume provisioing workflow improving timeout, error handling & idempotency. Signed-off-by: Yashpal Choudhary <[email protected]>

iyashu force-pushed the lvm-error-handling branch from 30b5feb to 3fab614 Compare March 1, 2021 13:57

reschedule volume provisioning in case of any error from lvcreate

3af61a5

Signed-off-by: Yashpal Choudhary <[email protected]>

iyashu requested a review from pawanpraka1 March 2, 2021 11:51

pawanpraka1 reviewed Mar 2, 2021

View reviewed changes

pawanpraka1 approved these changes Mar 3, 2021

View reviewed changes

pawanpraka1 merged commit e965f93 into openebs:master Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(provisioning) enable pod resheduling cause of node insufficient capacity #23

feat(provisioning) enable pod resheduling cause of node insufficient capacity #23

iyashu commented Feb 17, 2021

pawanpraka1 Feb 23, 2021

iyashu Feb 28, 2021

pawanpraka1 Feb 23, 2021

iyashu Feb 28, 2021

pawanpraka1 Feb 25, 2021

iyashu Feb 28, 2021

pawanpraka1 Feb 25, 2021

iyashu Feb 28, 2021

pawanpraka1 Feb 25, 2021

iyashu Feb 28, 2021

pawanpraka1 commented Mar 1, 2021

iyashu commented Mar 1, 2021

codecov-io commented Mar 1, 2021

pawanpraka1 Mar 2, 2021

iyashu Mar 2, 2021 •

edited

Loading

pawanpraka1 left a comment •

edited

Loading

pawanpraka1 Mar 3, 2021

iyashu Mar 3, 2021

iyashu commented Mar 3, 2021 •

edited

Loading

pawanpraka1 commented Mar 3, 2021 •

edited

Loading

feat(provisioning) enable pod resheduling cause of node insufficient capacity #23

feat(provisioning) enable pod resheduling cause of node insufficient capacity #23

Conversation

iyashu commented Feb 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pawanpraka1 commented Mar 1, 2021

iyashu commented Mar 1, 2021

codecov-io commented Mar 1, 2021

Codecov Report

Choose a reason for hiding this comment

iyashu Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

pawanpraka1 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iyashu commented Mar 3, 2021 • edited Loading

pawanpraka1 commented Mar 3, 2021 • edited Loading

iyashu Mar 2, 2021 •

edited

Loading

pawanpraka1 left a comment •

edited

Loading

iyashu commented Mar 3, 2021 •

edited

Loading

pawanpraka1 commented Mar 3, 2021 •

edited

Loading