-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: snapshot restore failed with Message = failed to get acl entries: Too many links #1514
Comments
@ybrock: Thank you for submitting this issue! The issue is currently awaiting triage. Please make sure you have given us as much context as possible. If the maintainers determine this is a relevant issue, they will remove the needs-triage label and respond appropriately. We want your feedback! If you have any questions or suggestions regarding our contributing process/workflow, please reach out to us at [email protected]. |
Hi @ybrock |
The helm chart used to install Dell CSM is 1.3.1 Sorry for the unprecision |
/sync |
link: 29803 |
Hello @ybrock , |
Hello,
You create symlink like above, |
Hey @ybrock |
@ybrock Another test that would be nice to run is to reproduce the array operations outside of the CSI driver. Create the share and mount then create the link, create the snapshot, then try the restore. See if that works. In reality we do bind mounts so not exactly the same sequence of operations but you can follow the mount operations by looking at the node driver logs to see how the volume is mounted and try to reproduce those series of mounts. The CSI driver does not deal with individual files on the volume so any traversing of the filesystem is not done by the driver but could be done as part of the PowerScale snapshot process. We will do some research on our end as well. Thanks. |
Hello, We will try to reproduce the issue as suggested outside the CSI driver, I'll let you know. It the meantime, I can tell you that we're using OneFS Version: 9.5.0.8 regards |
@ybrock Is the target file that you are linking a real file or are there other intermediate links to the target. Seems like the error is coming from the OS and that could be due to many levels of links or perhaps circular links (which I doubt is the case). What happens if you do some other file level call on the volume, e.g. traverse the files via a find command. Do you see the "Too many links" error. The error itself is a system error. |
@ybrock have you tried isolating the issue ? |
Hello If you just symlink any subdirectory in the current directory it will stop working. For example :
There is no issue with a find command inside the volume. On the NFS share all looks good and neat. I was informed by Dell that a bug was found and corrected related to this issue, is it true ? |
Hey @ybrock |
Hello, The problem has been isolated yesterday with the help of Dell Support. We had a call with an engineer and we made some tests to reproduce the issue. The problem has been narrowed and his clearly related to the ACLs inheritance that we need on the parents folders. If we cut the permission inheritance totally, by removing all ACLS in the directory there is no problem anymore, the snapshot can be restored. It seems that when the CSI driver copies back the data from the ".snapshot" directory, it tries to set ACLs on the symlink and it fails (which is maybe normal), and does not skip the error, which aborts the restore. So it's linked to inheritance permissions, that are needed in our infrastructure to be sure a pod as the right permissions (group permissions) to write into a PVC. As the user who creates the PVC is the one used by the driver and the one who mounts and use the PVC is another random user generated by Openshift, we have to be sure the group permission allows write access to both of them. If we remove the permission inheritance the applications don't have the correct permissions on the PVC to read and write. Kind regards |
@ybrock Thanks for the detailed information. Let us know if you need anything from us on this else let us know if we can close this issue. |
Hello, When ACLs with inheritance are configured on the parent directory, the restore of the symlink fails. You can probably reproduce that if you set some ACLs. We have that kind of ACLs (as reported by nfs4_getfacls) :
|
Hi @ybrock, i was able to reproduce the issue on my 9.5.0.0 OneFS with a non-privileged account for Powerscale. Tested the same process in OneFS 9.10 and I did not see an issue. Driver does use API for copying files/directories in a snapshot to target. This may not be a driver bug. Would you consider trying the same process on an upgraded 9.10 Onefs powerscale ? |
Hello @kumarkgosa Thank you! |
@ybrock As the issue in not there in 9.10 and can be fixed by upgrading OneFS . I guess we should be good to close this issue. Please feel free to open new issue if you face any issue after upgrading . |
Bug Description
Hello,
We have CSM modules 1.3.1 with CSI drivers version 1.10.1 installed on Openshift 4.14.35 (K8s 1.27.16).
We have Dell PowerScale (Isilon) configured and running ok except for this issue.
When we try to restore a snapshot from a PVC containing a symlink, the new PVC is never created (pending) and these events are reported in the CSI driver :
The
provisioner
container is rising this message :If there is no symlink in the file system, the snapshot restore works.
Logs
Screenshots
No response
Additional Environment Information
No response
Steps to Reproduce
create a PVC on a powerscale storageClass
mount the PVC in a pod
write a file into PVC
create a symlink in the PVC pointing to previous file
take a snapshot
create a new PVC from restoring from previous snapshot
Expected Behavior
new PVC is created from snapshot
CSM Driver(s)
CSI 1.10.1
CSM 1.3.1
Installation Type
helm
Container Storage Modules Enabled
isilon
karavi
Container Orchestrator
openshift 4.14 (crio)
Operating System
redhat coreos
The text was updated successfully, but these errors were encountered: