Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i disabled standalone snapshot creation #675, the unorthodoxy implement cause other problem and confuse me. #2430

Closed
jpsn123 opened this issue Aug 20, 2021 · 4 comments

Comments

@jpsn123
Copy link

jpsn123 commented Aug 20, 2021

Hello @Madhu-1, I am a cloud architecture engineer, and I want to promote private cloud based on open source solution in my company. I use Rook as a bridge between Ceph storage and K8S. After deploying all the production applications, I backed up the cluster with Velero including taking a snapshot of all the PVC (about 100), then I went to Ceph Dashbord and I was shocked that all the image corresponding to my PVC have disappeared and there are just left thousands more image prefix with csi-snap-xxx. I am very curious about why there are so many images, I just take snapshots and where is my original images?

Finally, I found that my image was still there by ceph-cli, but two more op_features were added and hidden by ceph dashbord. Then I discovered the mechanism of taking snapshots in further research (#693 #675).

Assuming that taking snapshots twice a day and keep them for a year, I will have 365x100x2=73,000 images!!! Ceph will maintain a large index to find images. I don't know how ceph is designed internally, but I do know that if there are hundreds of thousands of files in a directory, The ls command maybe crash and the file index will be slow. In addition, there are a number of other problems with this implementation. For example, configuring mirroring for some important images(not all) by rbd mirror image enable {pool-name}/{image-name} snapshot will be out of work.

I think this is a terrible design and it breaks the ceph official Snapshot design, probably because I don't know much about the details, but for now I would like to disable this snapshot implement. steps as shown below:

Create a snapshot

  • Create a snapshot with requested Name from the parent volume

Create PVC from a snapshot

  • Clone a new image from the snapshot whice user provided with options --rbd-default-clone-format 2 --image-feature layering,deep-flatten

Delete a snapshot

  • Delete the snapshot

I am very glad to receive your reply, thanks.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Aug 23, 2021

Hello @Madhu-1, I am a cloud architecture engineer, and I want to promote private cloud based on open source solution in my company. I use Rook as a bridge between Ceph storage and K8S. After deploying all the production applications, I backed up the cluster with Velero including taking a snapshot of all the PVC (about 100), then I went to Ceph Dashbord and I was shocked that all the image corresponding to my PVC have disappeared and there are just left thousands more image prefix with csi-snap-xxx. I am very curious about why there are so many images, I just take snapshots and where is my original images?

If you are not seeing your original images something wrong with your setup and I suggest opening a separate issue for that one.

Finally, I found that my image was still there by ceph-cli, but two more op_features were added and hidden by ceph dashbord. Then I discovered the mechanism of taking snapshots in further research (#693 #675).

Assuming that taking snapshots twice a day and keep them for a year, I will have 365x100x2=73,000 images!!! Ceph will maintain a large index to find images. I don't know how ceph is designed internally, but I do know that if there are hundreds of thousands of files in a directory, The ls command maybe crash and the file index will be slow. In addition, there are a number of other problems with this implementation. For example, configuring mirroring for some important images(not all) by rbd mirror image enable {pool-name}/{image-name} snapshot will be out of work.

Mirroring is not a problem, for now, I have opened few issues related to mirroring and we will handle that in cephcsi internally

The images will get flatten once a certain limit is reached and We have not seen any issue when creating more images and the design was suggested and approved by the RBD engineer. If you have seen any performance issues related to rbd images we can discuss them further.

I think this is a terrible design and it breaks the ceph official Snapshot design, probably because I don't know much about the details, but for now I would like to disable this snapshot implement. steps as shown below:

No, cephcsi implementation is completed different from the core ceph for below 2 reason

Create a snapshot

  • Create a snapshot with requested Name from the parent volume

Create PVC from a snapshot

  • Clone a new image from the snapshot whice user provided with options --rbd-default-clone-format 2 --image-feature layering,deep-flatten

Delete a snapshot

  • Delete the snapshot

I am very glad to receive your reply, thanks.

@humblec
Copy link
Collaborator

humblec commented Aug 23, 2021

After deploying all the production applications, I backed up the cluster with Velero including taking a snapshot of all the PVC (about 100), then I went to Ceph Dashbord and I was shocked that all the image corresponding to my PVC have disappeared and there are just left thousands more image prefix with csi-snap-xxx

@jpsn123 just to confirm above, are you saying after taking just 100 PVC snapshots, you are seeing "1000s" of image in the cluster ? if thats the case, its not supposed to happen and looks there is something wrong in the cluster. we have to find out whats causing the extra images to pop up. That looks to be the core issue here.

@jpsn123
Copy link
Author

jpsn123 commented Aug 24, 2021

@Madhu-1 thank your help.

  • original images was hidden by ceph dashboard, i will post a new issue.

  • I'm very happy you guys are already working on Mirroring, thanks.

  • performance anxiety is just my concern. I just want to make sure that you have given it serious thought. If the RBD engineer says there is no problem, it's great.

  • PVC and snapshot are independent objects

    This sentence inspired me, I will go to know more about K8S CSI, thanks.

Perhaps my mind has not changed if I have used the full functionality of CSI, I shouldn't manage images in the traditional way.

@jpsn123
Copy link
Author

jpsn123 commented Aug 24, 2021

@humblec there is no extra images, actually i take snapshot serveral times, so it pop up about 600 images.

@jpsn123 jpsn123 closed this as completed Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants