rbd: include trashed parent images while calculating the clone depth #4029

nixpanic · 2023-08-02T10:03:25Z

The getCloneDepth() function did not account for images that are in
the trash. A trashed image can only be opened by the image-id, and not
by name anymore.

Closes: #4013
Depends-on: #4064 #4273

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

/retest ci/centos/<job-name>: retest the <job-name> after unrelated
failure (please report the failure too!)

nixpanic · 2023-08-04T07:35:49Z

/test ci/centos/mini-e2e/k8s-1.27

Rakshith-R

This pr looks good to me.

internal/rbd/rbd_util.go

mergify · 2023-08-08T09:00:30Z

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

nixpanic · 2023-08-22T15:10:53Z

/test ci/centos/mini-e2e/k8s-1.27

nixpanic · 2023-08-22T15:11:59Z

/test ci/centos/mini-e2e/k8s-1.27

nixpanic · 2023-08-22T15:30:31Z

Manual stress testing with the scripts from rook/rook#12312 passes.

Need to fix the golangci-lint issues and have the go-ceph rebase merged before this is completely ready.

Rakshith-R · 2023-08-23T10:12:32Z

internal/rbd/rbd_util.go

+//
+// This function re-uses the ioctx of the image to open all images in the
+// chain. There is no need to open new ioctx's for every image.
+func (ri *rbdImage) getCloneDepth() (uint, error) {


hey @nixpanic,
Can we add another arguement like maxDepthToTraverse which can be used to exit and not traverse further unnecessarily?

We can pass hard rbdHardMaxCloneDepth+1 to the function call then.

wdyt?

sure, I'll add that

Added a 2nd commit to address this.

internal/rbd/rbd_util.go

riya-singhal31

LGTM, thanks Niels

nixpanic · 2023-08-30T09:35:06Z

@Mergifyio rebase

mergify · 2023-08-30T09:35:28Z

rebase

✅ Branch has been successfully rebased

Madhu-1

LGTM, i will let @Rakshith-R to approve it as had some suggestions.

Rakshith-R

LGTM, thanks !

Rakshith-R · 2023-09-04T05:15:23Z

@Mergifyio refresh

mergify · 2023-09-04T05:15:25Z

refresh

✅ Pull request refreshed

Rakshith-R · 2023-09-04T05:15:59Z

@Mergifyio queue

mergify · 2023-09-04T05:16:03Z

queue

🛑 The pull request has been removed from the queue `default`

The queue conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

ceph-csi-bot · 2023-09-04T05:16:28Z

/test ci/centos/upgrade-tests-cephfs

nixpanic · 2023-11-14T15:57:44Z

For some reason, these commits prevent setting the meta-data for a cloned image (e2e suggests). When testing manually, no RBD-images get any meta-data set at all. The OMAP for all images (including the created snapshot) does have the expected meta-data... Still trying to understand what the e2e tests expect, and why they do not fail on previous test-cases.

It seemed that my deployment did not include --setmetadata=true for the provisioner... When setting it, the volume gets the expected (I think) metadata set. Trying again in the CI, now with enhanced logging.

nixpanic · 2023-11-14T15:57:52Z

/test ci/centos/mini-e2e/k8s-1.28

nixpanic · 2023-11-14T20:28:07Z

/test ci/centos/mini-e2e/k8s-1.28

nixpanic · 2023-11-15T11:29:42Z

For anyone watching this, I have now noticed that the ImageID of a rbdSnapshot is not set to the right value, it is set to the ImageID of the original parent volume. Using the ImageID to open the rbdSnapshot, actually opens the parent image.

nixpanic · 2023-11-16T14:33:26Z

Dropped the commit

rbd: use librbd.OpenImageById() if rbdVol.ImageID is set

as it seems that the ImageID is not always set to the correct value, but to the ImageID of a cloned image. Without this commit, e2e should pass.

nixpanic · 2023-11-16T14:33:36Z

/test ci/centos/mini-e2e/k8s-1.28

internal/csi-common/server.go

nixpanic · 2023-11-17T09:41:14Z

/test ci/centos/mini-e2e/k8s-1.28

nixpanic · 2023-11-27T14:09:29Z

/test ci/centos/mini-e2e/k8s-1.28

nixpanic · 2023-11-27T16:36:03Z

logs

  I1127 15:20:04.710772       1 utils.go:164] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 GRPC call: /csi.v1.Controller/DeleteVolume
  I1127 15:20:04.710860       1 utils.go:165] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 GRPC request: {"secrets":"***stripped***","volume_id":"0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6"}
  I1127 15:20:04.712116       1 omap.go:88] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 got omap values: (pool="replicapool", namespace="", name="csi.volume.ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6"): map[csi.imageid:1af286243815 csi.imagename:csi-vol-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 csi.volname:pvc-3d33943d-80f0-46d1-ad6e-d6165ee1b7e5 csi.volume.owner:rbd-6238]
  I1127 15:20:04.757621       1 rbd_util.go:637] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 rbd: delete csi-vol-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6-temp using mon rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789, pool replicapool
  I1127 15:20:04.762284       1 controllerserver.go:1021] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 deleting image csi-vol-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6
  I1127 15:20:04.762309       1 rbd_util.go:637] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 rbd: delete csi-vol-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 using mon rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789, pool replicapool
  E1127 15:20:04.763705       1 rbd_util.go:667] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 failed to delete rbd image: replicapool/csi-vol-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6, error: RBD image not found
  E1127 15:20:04.763736       1 controllerserver.go:1023] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 failed to delete rbd image: replicapool/csi-vol-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 with error: RBD image not found
  E1127 15:20:04.763801       1 utils.go:169] ID: 39 Req-ID: 0001-0024-65e89782-f2bd-4b30-85f4-fcad649cdded-0000000000000001-ccfe21c4-54cf-44cc-9a75-72ce1ad9adf6 GRPC error: rpc error: code = Internal desc = RBD image not found

Deleting the RBD image seems to have resulted in a (spurious?) error. The DeleteVolume CSI procedure is repeated over and over again, because the image was removed already.

nixpanic · 2023-11-27T16:36:31Z

/test ci/centos/mini-e2e/k8s-1.28

`librbd.OpenImageById()` works if the image is in the trash, so it makes it possible to get the parent of the image. Signed-off-by: Niels de Vos <[email protected]>

When a new volume is not created yet, the ImageID should not be set to the ID of the snapshot. Signed-off-by: Niels de Vos <[email protected]>

In some places the ImageID is used as the ID of the parent. That is very confusing and prone to errors. Instead, fetch the right ImageID where possible, and set ParentID for referencing to parent images. Signed-off-by: Niels de Vos <[email protected]>

Signed-off-by: Niels de Vos <[email protected]>

If the RBD-image is deleted already, the DeleteVolume CSI procedure is expected to report success (as it should be idempotent). In case the returned error indicates "RBD image not found", the error is ignored and the DeleteVolume procedure continues. Signed-off-by: Niels de Vos <[email protected]>

The `getCloneDepth()` function did not account for images that are in the trash. A trashed image can only be opened by the image-id, and not by name anymore. Closes: ceph#4013 Signed-off-by: Niels de Vos <[email protected]>

The `getCloneDepth()` function does not need to traverse the whole chain of parents when a certain max-limit is configured. The traversing can be aborted once the hard-limit is reached. This makes the procedure a little more efficient, as unnecessary traversing is prevented. Signed-off-by: Niels de Vos <[email protected]>

By returned `ABORTED` or `PRECONDITION_FAILED` in the right places, the CO will retry with the same arguments until the snapshot is `ReadyToUse`. This causes restoring a volume from a snapshot to be delayed, until the snapshot can be used. Signed-off-by: Niels de Vos <[email protected]>

nixpanic · 2023-12-01T11:26:29Z

/test ci/centos/mini-e2e/k8s-1.27

nixpanic · 2023-12-01T16:20:46Z

From the logs it looks like the flattening of images is in progress. I have not found messages that suggest something is blocked, but it is well possible that there is a deadlock on the PVC-restore and snapshot-delete somewhere (I have seen these in manual testing too, but though I addressed them).

Maybe #1883 (comment) contains more hints on things that I have missed in the current PR.

mergify · 2024-04-15T13:26:29Z

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

mergify bot added the component/rbd Issues related to RBD label Aug 2, 2023

Rakshith-R reviewed Aug 8, 2023

View reviewed changes

internal/rbd/rbd_util.go Outdated Show resolved Hide resolved

nixpanic force-pushed the issue/4013 branch from fc6bc63 to 655b870 Compare August 22, 2023 15:10

nixpanic force-pushed the issue/4013 branch from 655b870 to c4a5cf9 Compare August 22, 2023 15:11

nixpanic force-pushed the issue/4013 branch from c4a5cf9 to 0a55aeb Compare August 23, 2023 08:35

Rakshith-R reviewed Aug 23, 2023

View reviewed changes

nixpanic marked this pull request as ready for review August 24, 2023 09:52

nixpanic requested review from Rakshith-R and a team August 25, 2023 07:27

riya-singhal31 reviewed Aug 25, 2023

View reviewed changes

internal/rbd/rbd_util.go Show resolved Hide resolved

nixpanic requested a review from riya-singhal31 August 25, 2023 11:54

riya-singhal31 previously approved these changes Aug 25, 2023

View reviewed changes

nixpanic requested a review from a team August 29, 2023 11:57

nixpanic force-pushed the issue/4013 branch from aa3de11 to 52ffac0 Compare August 30, 2023 09:35

Madhu-1 reviewed Aug 30, 2023

View reviewed changes

Rakshith-R previously approved these changes Sep 4, 2023

View reviewed changes

nixpanic force-pushed the issue/4013 branch from 52ffac0 to 6c1700e Compare September 4, 2023 05:16

mergify bot added the ok-to-test Label to trigger E2E tests label Sep 4, 2023

nixpanic force-pushed the issue/4013 branch from 0f0364d to b29db8a Compare November 14, 2023 20:27

nixpanic force-pushed the issue/4013 branch from b29db8a to 504fe09 Compare November 16, 2023 14:31

Madhu-1 reviewed Nov 16, 2023

View reviewed changes

internal/csi-common/server.go Outdated Show resolved Hide resolved

nixpanic force-pushed the issue/4013 branch from 504fe09 to 4db6763 Compare November 17, 2023 09:40

nixpanic force-pushed the issue/4013 branch from 4db6763 to 373c670 Compare November 27, 2023 14:09

Rakshith-R mentioned this pull request Nov 28, 2023

doc: modify README and upgrade docs #4286

Merged

nixpanic added 6 commits December 1, 2023 12:10

rbd: use librbd.OpenImageById() if rbdVol.ImageID is set

f83da27

`librbd.OpenImageById()` works if the image is in the trash, so it makes it possible to get the parent of the image. Signed-off-by: Niels de Vos <[email protected]>

rbd: prevent presetting the ImageID of a new volume

b78dcd5

When a new volume is not created yet, the ImageID should not be set to the ID of the snapshot. Signed-off-by: Niels de Vos <[email protected]>

rbd: skip flattening if an image in trash

9ac559c

Signed-off-by: Niels de Vos <[email protected]>

nixpanic force-pushed the issue/4013 branch from d0231ca to 6ea1630 Compare December 1, 2023 11:10

nixpanic added 2 commits December 1, 2023 12:26

nixpanic force-pushed the issue/4013 branch from 6ea1630 to a29bea6 Compare December 1, 2023 11:26

nixpanic added the keepalive This label can be used to disable stale bot activiity in the repo label Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rbd: include trashed parent images while calculating the clone depth #4029

rbd: include trashed parent images while calculating the clone depth #4029

nixpanic commented Aug 2, 2023 •

edited

Loading

nixpanic commented Aug 4, 2023

Rakshith-R left a comment

mergify bot commented Aug 8, 2023

nixpanic commented Aug 22, 2023

nixpanic commented Aug 22, 2023

nixpanic commented Aug 22, 2023

Rakshith-R Aug 23, 2023

nixpanic Aug 24, 2023

nixpanic Aug 24, 2023

riya-singhal31 left a comment

nixpanic commented Aug 30, 2023

mergify bot commented Aug 30, 2023

Madhu-1 left a comment

Rakshith-R left a comment

Rakshith-R commented Sep 4, 2023

mergify bot commented Sep 4, 2023

Rakshith-R commented Sep 4, 2023

mergify bot commented Sep 4, 2023 •

edited

Loading

ceph-csi-bot commented Sep 4, 2023

nixpanic commented Nov 14, 2023

nixpanic commented Nov 14, 2023

nixpanic commented Nov 14, 2023

nixpanic commented Nov 15, 2023

nixpanic commented Nov 16, 2023

nixpanic commented Nov 16, 2023

nixpanic commented Nov 17, 2023

nixpanic commented Nov 27, 2023

nixpanic commented Nov 27, 2023

nixpanic commented Nov 27, 2023

nixpanic commented Dec 1, 2023

nixpanic commented Dec 1, 2023

mergify bot commented Apr 15, 2024

rbd: include trashed parent images while calculating the clone depth #4029

Are you sure you want to change the base?

rbd: include trashed parent images while calculating the clone depth #4029

Conversation

nixpanic commented Aug 2, 2023 • edited Loading

nixpanic commented Aug 4, 2023

Rakshith-R left a comment

Choose a reason for hiding this comment

mergify bot commented Aug 8, 2023

nixpanic commented Aug 22, 2023

nixpanic commented Aug 22, 2023

nixpanic commented Aug 22, 2023

Rakshith-R Aug 23, 2023

Choose a reason for hiding this comment

nixpanic Aug 24, 2023

Choose a reason for hiding this comment

nixpanic Aug 24, 2023

Choose a reason for hiding this comment

riya-singhal31 left a comment

Choose a reason for hiding this comment

nixpanic commented Aug 30, 2023

mergify bot commented Aug 30, 2023

✅ Branch has been successfully rebased

Madhu-1 left a comment

Choose a reason for hiding this comment

Rakshith-R left a comment

Choose a reason for hiding this comment

Rakshith-R commented Sep 4, 2023

mergify bot commented Sep 4, 2023

✅ Pull request refreshed

Rakshith-R commented Sep 4, 2023

mergify bot commented Sep 4, 2023 • edited Loading

🛑 The pull request has been removed from the queue default

ceph-csi-bot commented Sep 4, 2023

nixpanic commented Nov 14, 2023

nixpanic commented Nov 14, 2023

nixpanic commented Nov 14, 2023

nixpanic commented Nov 15, 2023

nixpanic commented Nov 16, 2023

nixpanic commented Nov 16, 2023

nixpanic commented Nov 17, 2023

nixpanic commented Nov 27, 2023

nixpanic commented Nov 27, 2023

nixpanic commented Nov 27, 2023

nixpanic commented Dec 1, 2023

nixpanic commented Dec 1, 2023

mergify bot commented Apr 15, 2024

nixpanic commented Aug 2, 2023 •

edited

Loading

mergify bot commented Sep 4, 2023 •

edited

Loading

🛑 The pull request has been removed from the queue `default`