create and delete a large number of snapshots leaving stale snapshots in the backend #446

Madhu-1 · 2019-06-26T11:38:20Z

Describe the bug

create and delete a large number of snapshots leaving stale snapshots in the backend

Environment details

Image/version of Ceph CSI driver: canary
helm chart version
Kubernetes cluster version:1.14.2
Logs

rbd plugin logs
rbd.log

snapshotter logs

snap.log

Steps to reproduce

Steps to reproduce the behavior:

Setup details: '...'
Deployment to trigger the issue '....'
See error

Actual results

created 50 snapshots and deleted 50 snapshots, there are around 14 stale snapshots in the backend

Expected behavior

once we delete Kube snapshots there should not be any stale snapshots in the backend

Madhu-1 · 2019-06-26T11:38:40Z

@ShyamsundarR @humblec PTAL

ShyamsundarR · 2019-06-27T00:56:30Z

There are few things happening here, primarily because the requests are timing out.

The snapshotter sidecar does NOT retry taking snapshots, instead it tries once and if the call times out, the volumesnapshot object gets an error. In the background the snapshot is actually created by the first RPC to request the same, but kubernetes does not have a record of the same (i.e the SnapID). Thus, when the snapshot is deleted, kubernetes only deletes the volumesnapshot object and does not invoke DeleteSnapshot RPC against the plugin. This hence leaks snapshots.

The lock fix patch (#443) alleviates the problem, as it speeds up the operations (including snapshots) but there are still corner cases that may occur leaking snapshots.

BTW, Similarly there can be corner cases in PVC->PV creates and deletes, that can leak images as kubernetes never recorded a success from the plugin. I have seen this happen when working on the performance improvements with the locks. It needs some more analysis, but there are cases when on large RPC response times we may leak an image.

There is the ReadyToUse flag with snapshots that can help, as the CO is supposed to keep retrying once a response is sent with this set to false, to ensure this becomes ready at some point in time. We can only do this post creating the RADOS maps for the snapshot and possibly taking the snapshot, so if these calls take time we will again get back to leaking snapshots. Also, the sidecar code needs to be checked if this is retried as expected by the specification.

Looking at the logs you provided, there are some discrepancies in call numbers (i.e Create/Delete/RPC success etc.) which explains the leaks, but to me it looks like you attempted to invoke delete before the snapshots were created as well.

In my test, I attempted creating 25 snapshots using the attached script, which actually waits for the ReadyToUse flag to become true (which essentially means the volumesnapshotcontent object was created, which is done post a success by CreateSnapshot is received), and then I invoke the delete script.
snap-create-perf.sh.txt
snap-delete-perf.sh.txt

BUT, even with the above and without the lock fix patch, only 9 snapshots were marked ready and had volumesnapshotcontent objects created, for the remaining 16 (I created 25) the snapshots were created in the background but as the calls timed out kubernetes never sent the delete request. IOW, we leaked 16 snapshots in the create phase itself, and delete did not have any role to play post that.

Also, I tried an experiment adding a sleep to the CreateSnapshot call (of 65 seconds, as the timeout for the CreateSnapshot is 60 seconds in the snapshotter sidecar and not 10 seconds which is what the documentation for the same states), this call, as expected timed out on the sidecar and the snapshot was never recorded by kubernetes, but as expected the image was created in the background.

We need to understand what to do next, and also possibly raise this with CSI/kube folks to understand expectations here and how to handle not losing state.

ShyamsundarR · 2019-06-27T01:28:38Z

We possibly need the snapshotter side car to implement a more robust timeout and retry mechanism like the provisioner does: https://github.com/kubernetes-csi/external-provisioner#csi-error-and-timeout-handling

ShyamsundarR · 2019-06-29T11:17:24Z

We possibly need the snapshotter side car to implement a more robust timeout and retry mechanism like the provisioner does: https://github.com/kubernetes-csi/external-provisioner#csi-error-and-timeout-handling

On further thought, snapshot may not be able to retry endlessly (owing to concepts like freezing/thawing workloads using the volume, prior and post the snapshot, and IO to the volume cannot be held back for long duration's). It may hence require some other form of fix in the kubernetes snapshotter. Will start a discussion there.

Madhu-1 · 2019-07-01T10:08:58Z

We possibly need the snapshotter side car to implement a more robust timeout and retry mechanism like the provisioner does: https://github.com/kubernetes-csi/external-provisioner#csi-error-and-timeout-handling

On further thought, snapshot may not be able to retry endlessly (owing to concepts like freezing/thawing workloads using the volume, prior and post the snapshot, and IO to the volume cannot be held back for long duration's). It may hence require some other form of fix in the kubernetes snapshotter. Will start a discussion there.

@ShyamsundarR can you point me to the snapshot discussion if it already started?

Madhu-1 added the bug Something isn't working label Jun 26, 2019

ShyamsundarR mentioned this issue Jun 27, 2019

Move locks to more granular locking than CPU count based #443

Merged

ShyamsundarR mentioned this issue Jul 2, 2019

Problems dealing with snapshot create requests timing out kubernetes-csi/external-snapshotter#134

Closed

nixpanic added the component/rbd Issues related to RBD label Apr 17, 2020

Madhu-1 mentioned this issue Jul 1, 2020

rbd: Implement snapshot and clone from snapshot #1160

Merged

9 tasks

mergify bot closed this as completed in #1160 Jul 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create and delete a large number of snapshots leaving stale snapshots in the backend #446

create and delete a large number of snapshots leaving stale snapshots in the backend #446

Madhu-1 commented Jun 26, 2019

Madhu-1 commented Jun 26, 2019

ShyamsundarR commented Jun 27, 2019

ShyamsundarR commented Jun 27, 2019

ShyamsundarR commented Jun 29, 2019

Madhu-1 commented Jul 1, 2019

create and delete a large number of snapshots leaving stale snapshots in the backend #446

create and delete a large number of snapshots leaving stale snapshots in the backend #446

Comments

Madhu-1 commented Jun 26, 2019

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Madhu-1 commented Jun 26, 2019

ShyamsundarR commented Jun 27, 2019

ShyamsundarR commented Jun 27, 2019

ShyamsundarR commented Jun 29, 2019

Madhu-1 commented Jul 1, 2019