-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
glusterfs snapd crashes when snapshot is de-activated. #3103
Comments
amarts
added a commit
to amarts/glusterfs_fork
that referenced
this issue
Jan 7, 2022
`table->root` inode is agnostic to ref/unref as per the current inode table implementation, but when in case of `snapd` process, the root inode of snapshot process is mapped to another directory inode in global inode table. Hence we need some 'extra' protection for this 'root' inode from the snapshot process. Add an extra ref to the inode which goes through glfs object initialization. This would prevent a possible mismanagement of root inode during de-activate of snapshot. Updates: gluster#3103 Change-Id: I12b9c85f677c2868ef112f36547eb69dc80d3b7b Signed-off-by: Amar Tumballi <[email protected]>
xhernandez
pushed a commit
that referenced
this issue
Feb 16, 2022
`table->root` inode is agnostic to ref/unref as per the current inode table implementation, but when in case of `snapd` process, the root inode of snapshot process is mapped to another directory inode in global inode table. Hence we need some 'extra' protection for this 'root' inode from the snapshot process. Add an extra ref to the inode which goes through glfs object initialization. This would prevent a possible mismanagement of root inode during de-activate of snapshot. Updates: #3103 Change-Id: I12b9c85f677c2868ef112f36547eb69dc80d3b7b Signed-off-by: Amar Tumballi <[email protected]>
Thank you for your contributions. |
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description of problem:
1.Create some files via NFS client on an glusterfs-ganesha exported volume, take a snapshot (mysnap1). Enter the .snap/mysnap1 directory on the nfs mount and do
ls
in a while loop:[root@node0 mysnap1]# while true; do ls; sleep 1; done
2.While this is happening, deactivate the snapshot. The
ls
errors out, snapd crashes.The exact command to reproduce the issue:
The full output of the command that failed:
Expected results:
Mandatory info:
**- Is there any crash ? Provide the backtrace and coredump
Yes
Backtrace from gdb inspection of the core:
[root@node1 tmp]# gdb glusterfsd core.glusterfsd.0.9fb25f157f934233bb2e60c160b96a0e.424541.1640843975000000
Core was generated by `/usr/sbin/glusterfsd -s localhost --volfile-id snapd/NS1 -p /var/run/gluster/vo'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fb86b40837f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fb85c0a7700 (LWP 5886))]
(gdb) bt
#0 0x00007fb86b40837f in raise () from /lib64/libc.so.6
#1 0x00007fb86b3f2db5 in abort () from /lib64/libc.so.6
#2 0x00007fb86b3f2c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3 0x00007fb86b400a76 in __assert_fail () from /lib64/libc.so.6
#4 0x00007fb86bc8f8ff in __pthread_tpp_change_priority () from /lib64/libpthread.so.0
#5 0x00007fb86bc865ec in __pthread_mutex_lock_full () from /lib64/libpthread.so.0
#6 0x00007fb86d11429a in inode_ctx_get0 (inode=inode@entry=0x7fb81c030a38, xlator=xlator@entry=0x7fb83805efe0, value1=value1@entry=0x7fb85c0a5830) at inode.c:2261
#7 0x00007fb86d114337 in inode_needs_lookup (inode=0x7fb81c030a38, this=0x7fb83805efe0) at inode.c:2061
#8 0x00007fb864798d18 in __glfs_resolve_inode (fs=fs@entry=0x7fb838041600, subvol=subvol@entry=0x7fb8140242f0, object=object@entry=0x7fb83807a600) at glfs-resolve.c:1107
#9 0x00007fb864798ec7 in glfs_resolve_inode (fs=fs@entry=0x7fb838041600, subvol=subvol@entry=0x7fb8140242f0, object=object@entry=0x7fb83807a600) at glfs-resolve.c:1133
#10 0x00007fb86479936d in pub_glfs_h_lookupat (fs=fs@entry=0x7fb838041600, parent=0x7fb83807a600, path=0x7fb828008e60 'hatest.0.18', stat=stat@entry=0x7fb85c0a5a70, follow=follow@entry=0) at glfs-handleops.c:98
#11 0x00007fb8649b00b3 in svs_lookup_entry (this=this@entry=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, buf=buf@entry=0x7fb85c0a5c10, postparent=postparent@entry=0x7fb85c0a5b70, parent=parent@entry=0x7fb7ec0027c8,
parent_ctx=0x7fb838003820, op_errno=0x7fb85c0a5d9c) at snapview-server.c:364
#12 0x00007fb8649b33cd in svs_get_handle (this=this@entry=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, inode_ctx=inode_ctx@entry=0x7fb83807a2a0, op_errno=op_errno@entry=0x7fb85c0a5d9c) at snapview-server.c:1944
#13 0x00007fb8649b62ad in svs_stat (frame=frame@entry=0x7fb838060048, this=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, xdata=xdata@entry=0x0) at snapview-server.c:2003
#14 0x00007fb86d1aa47a in default_stat_resume (frame=0x7fb8280016f8, this=0x7fb86000ed50, loc=0x7fb82803c7e0, xdata=0x0) at defaults.c:2205
#15 0x00007fb86d125dd5 in call_resume (stub=0x7fb82803c798) at call-stub.c:2392
#16 0x00007fb85f9c3878 in iot_worker (data=0x7fb8600265d0) at io-threads.c:232
#17 0x00007fb86bc8415a in start_thread () from /lib64/libpthread.so.0
#18 0x00007fb86b4cddd3 in clone () from /lib64/libc.so.6
(gdb) f 6
#6 0x00007fb86d11429a in inode_ctx_get0 (inode=inode@entry=0x7fb81c030a38, xlator=xlator@entry=0x7fb83805efe0, value1=value1@entry=0x7fb85c0a5830) at inode.c:2261
2261 LOCK(&inode->lock);
(gdb) p *inode
$1 = {table = 0x7fb81c02aa90, gfid = 'G\213\001', '\000' <repeats 12 times>, lock = {spinlock = 1640843969, mutex = {__data = {__lock = 1640843969, __count = 0, __owner = 475060, __nusers = 0, __kind = 1640843969, __spins = 0,
__elision = 0, __list = {__prev = 0x73fb4, __next = 0x7fb810004d40}}, __size = '\301J\315a\000\000\000\000\264?\a\000\000\000\000\000\301J\315a\000\000\000\000\264?\a\000\000\000\000\000@M\000\020\270\177\000',
__align = 1640843969}}, nlookup = {lk = 0x7fb81c030a78 '\320\354\001\034\270\177', value = 140428720598224}, fd_count = 469942784, active_fd_count = 32696, ref = 735, ia_type = IA_IFSOCK, fd_list = {next = 0x0,
prev = 0x7fb81c030a98}, dentry_list = {next = 0x7fb81c030a98, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0, in_invalidate_list = false, invalidate_sent = false, in_lru_list = false}
(gdb) p inode->gfid
$2 = 'G\213\001', '\000' <repeats 12 times>
(gdb) p /x inode->gfid
$3 = {0x47, 0x8b, 0x1, 0x0 <repeats 13 times>}
It crashed when attempting a lookup on one of the files (hatest.0.18). It looks like the inode is invalid since many of the fields seem whacky. For eg. 'inode->gfid' does not match either the parent (gfid=1, or the gfid of hatest.0.18 which I got from getfattr output, namely trusted.gfid=0x8f5a24dce15b466185a12b24d11c1d88) . The ia_type, fd_count etc. seems way off.
Additional info:
We are suspecting that when you deactivate the snapshot, all inodes on that snap are retired. But client does not know this and there could be an in-flight operation right at the time deactivate is in progress.
- The operating system / glusterfs version:
package-string: glusterfs 9.4
The text was updated successfully, but these errors were encountered: