glusterfs snapd crashes when snapshot is de-activated. #3103

amarts · 2022-01-07T15:20:11Z

Description of problem:
1.Create some files via NFS client on an glusterfs-ganesha exported volume, take a snapshot (mysnap1). Enter the .snap/mysnap1 directory on the nfs mount and do ls in a while loop:

[root@node0 mysnap1]# while true; do ls; sleep 1; done

2.While this is happening, deactivate the snapshot. The ls errors out, snapd crashes.

The exact command to reproduce the issue:

The full output of the command that failed:

Expected results:

Mandatory info:

**- Is there any crash ? Provide the backtrace and coredump
Yes

Backtrace from gdb inspection of the core:
[root@node1 tmp]# gdb glusterfsd core.glusterfsd.0.9fb25f157f934233bb2e60c160b96a0e.424541.1640843975000000

Core was generated by `/usr/sbin/glusterfsd -s localhost --volfile-id snapd/NS1 -p /var/run/gluster/vo'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fb86b40837f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fb85c0a7700 (LWP 5886))]

(gdb) bt
#0 0x00007fb86b40837f in raise () from /lib64/libc.so.6
#1 0x00007fb86b3f2db5 in abort () from /lib64/libc.so.6
#2 0x00007fb86b3f2c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3 0x00007fb86b400a76 in __assert_fail () from /lib64/libc.so.6
#4 0x00007fb86bc8f8ff in __pthread_tpp_change_priority () from /lib64/libpthread.so.0
#5 0x00007fb86bc865ec in __pthread_mutex_lock_full () from /lib64/libpthread.so.0
#6 0x00007fb86d11429a in inode_ctx_get0 (inode=inode@entry=0x7fb81c030a38, xlator=xlator@entry=0x7fb83805efe0, value1=value1@entry=0x7fb85c0a5830) at inode.c:2261
#7 0x00007fb86d114337 in inode_needs_lookup (inode=0x7fb81c030a38, this=0x7fb83805efe0) at inode.c:2061
#8 0x00007fb864798d18 in __glfs_resolve_inode (fs=fs@entry=0x7fb838041600, subvol=subvol@entry=0x7fb8140242f0, object=object@entry=0x7fb83807a600) at glfs-resolve.c:1107
#9 0x00007fb864798ec7 in glfs_resolve_inode (fs=fs@entry=0x7fb838041600, subvol=subvol@entry=0x7fb8140242f0, object=object@entry=0x7fb83807a600) at glfs-resolve.c:1133
#10 0x00007fb86479936d in pub_glfs_h_lookupat (fs=fs@entry=0x7fb838041600, parent=0x7fb83807a600, path=0x7fb828008e60 'hatest.0.18', stat=stat@entry=0x7fb85c0a5a70, follow=follow@entry=0) at glfs-handleops.c:98
#11 0x00007fb8649b00b3 in svs_lookup_entry (this=this@entry=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, buf=buf@entry=0x7fb85c0a5c10, postparent=postparent@entry=0x7fb85c0a5b70, parent=parent@entry=0x7fb7ec0027c8,
parent_ctx=0x7fb838003820, op_errno=0x7fb85c0a5d9c) at snapview-server.c:364
#12 0x00007fb8649b33cd in svs_get_handle (this=this@entry=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, inode_ctx=inode_ctx@entry=0x7fb83807a2a0, op_errno=op_errno@entry=0x7fb85c0a5d9c) at snapview-server.c:1944
#13 0x00007fb8649b62ad in svs_stat (frame=frame@entry=0x7fb838060048, this=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, xdata=xdata@entry=0x0) at snapview-server.c:2003
#14 0x00007fb86d1aa47a in default_stat_resume (frame=0x7fb8280016f8, this=0x7fb86000ed50, loc=0x7fb82803c7e0, xdata=0x0) at defaults.c:2205
#15 0x00007fb86d125dd5 in call_resume (stub=0x7fb82803c798) at call-stub.c:2392
#16 0x00007fb85f9c3878 in iot_worker (data=0x7fb8600265d0) at io-threads.c:232
#17 0x00007fb86bc8415a in start_thread () from /lib64/libpthread.so.0
#18 0x00007fb86b4cddd3 in clone () from /lib64/libc.so.6
(gdb) f 6
#6 0x00007fb86d11429a in inode_ctx_get0 (inode=inode@entry=0x7fb81c030a38, xlator=xlator@entry=0x7fb83805efe0, value1=value1@entry=0x7fb85c0a5830) at inode.c:2261
2261 LOCK(&inode->lock);
(gdb) p *inode
$1 = {table = 0x7fb81c02aa90, gfid = 'G\213\001', '\000' <repeats 12 times>, lock = {spinlock = 1640843969, mutex = {data = {lock = 1640843969, count = 0, owner = 475060, nusers = 0, kind = 1640843969, spins = 0,
elision = 0, list = {prev = 0x73fb4, next = 0x7fb810004d40}}, size = '\301J\315a\000\000\000\000\264?\a\000\000\000\000\000\301J\315a\000\000\000\000\264?\a\000\000\000\000\000@M\000\020\270\177\000',
__align = 1640843969}}, nlookup = {lk = 0x7fb81c030a78 '\320\354\001\034\270\177', value = 140428720598224}, fd_count = 469942784, active_fd_count = 32696, ref = 735, ia_type = IA_IFSOCK, fd_list = {next = 0x0,
prev = 0x7fb81c030a98}, dentry_list = {next = 0x7fb81c030a98, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0, in_invalidate_list = false, invalidate_sent = false, in_lru_list = false}
(gdb) p inode->gfid
$2 = 'G\213\001', '\000' <repeats 12 times>
(gdb) p /x inode->gfid
$3 = {0x47, 0x8b, 0x1, 0x0 <repeats 13 times>}

It crashed when attempting a lookup on one of the files (hatest.0.18). It looks like the inode is invalid since many of the fields seem whacky. For eg. 'inode->gfid' does not match either the parent (gfid=1, or the gfid of hatest.0.18 which I got from getfattr output, namely trusted.gfid=0x8f5a24dce15b466185a12b24d11c1d88) . The ia_type, fd_count etc. seems way off.

Additional info:
We are suspecting that when you deactivate the snapshot, all inodes on that snap are retired. But client does not know this and there could be an in-flight operation right at the time deactivate is in progress.

- The operating system / glusterfs version:
package-string: glusterfs 9.4

The text was updated successfully, but these errors were encountered:

`table->root` inode is agnostic to ref/unref as per the current inode table implementation, but when in case of `snapd` process, the root inode of snapshot process is mapped to another directory inode in global inode table. Hence we need some 'extra' protection for this 'root' inode from the snapshot process. Add an extra ref to the inode which goes through glfs object initialization. This would prevent a possible mismanagement of root inode during de-activate of snapshot. Updates: gluster#3103 Change-Id: I12b9c85f677c2868ef112f36547eb69dc80d3b7b Signed-off-by: Amar Tumballi <[email protected]>

`table->root` inode is agnostic to ref/unref as per the current inode table implementation, but when in case of `snapd` process, the root inode of snapshot process is mapped to another directory inode in global inode table. Hence we need some 'extra' protection for this 'root' inode from the snapshot process. Add an extra ref to the inode which goes through glfs object initialization. This would prevent a possible mismanagement of root inode during de-activate of snapshot. Updates: #3103 Change-Id: I12b9c85f677c2868ef112f36547eb69dc80d3b7b Signed-off-by: Amar Tumballi <[email protected]>

stale · 2022-08-11T22:43:53Z

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

stale · 2022-10-01T10:01:36Z

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

amarts added the Type:Bug label Jan 7, 2022

amarts mentioned this issue Jan 7, 2022

snapview-server: fix a possible extra 'unref()' on snapshot root inode #3105

Merged

stale bot added the wontfix Managed by stale[bot] label Aug 11, 2022

stale bot closed this as completed Oct 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glusterfs snapd crashes when snapshot is de-activated. #3103

glusterfs snapd crashes when snapshot is de-activated. #3103

amarts commented Jan 7, 2022

stale bot commented Aug 11, 2022

stale bot commented Oct 1, 2022

glusterfs snapd crashes when snapshot is de-activated. #3103

glusterfs snapd crashes when snapshot is de-activated. #3103

Comments

amarts commented Jan 7, 2022

stale bot commented Aug 11, 2022

stale bot commented Oct 1, 2022