Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glusterfs snapd crashes when snapshot is de-activated. #3103

Closed
amarts opened this issue Jan 7, 2022 · 2 comments
Closed

glusterfs snapd crashes when snapshot is de-activated. #3103

amarts opened this issue Jan 7, 2022 · 2 comments
Labels
Type:Bug wontfix Managed by stale[bot]

Comments

@amarts
Copy link
Member

amarts commented Jan 7, 2022

Description of problem:
1.Create some files via NFS client on an glusterfs-ganesha exported volume, take a snapshot (mysnap1). Enter the .snap/mysnap1 directory on the nfs mount and do ls in a while loop:

[root@node0 mysnap1]# while true; do ls; sleep 1; done

2.While this is happening, deactivate the snapshot. The ls errors out, snapd crashes.

The exact command to reproduce the issue:

The full output of the command that failed:

Expected results:

Mandatory info:

**- Is there any crash ? Provide the backtrace and coredump
Yes

Backtrace from gdb inspection of the core:
[root@node1 tmp]# gdb glusterfsd core.glusterfsd.0.9fb25f157f934233bb2e60c160b96a0e.424541.1640843975000000

Core was generated by `/usr/sbin/glusterfsd -s localhost --volfile-id snapd/NS1 -p /var/run/gluster/vo'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fb86b40837f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fb85c0a7700 (LWP 5886))]

(gdb) bt
#0 0x00007fb86b40837f in raise () from /lib64/libc.so.6
#1 0x00007fb86b3f2db5 in abort () from /lib64/libc.so.6
#2 0x00007fb86b3f2c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3 0x00007fb86b400a76 in __assert_fail () from /lib64/libc.so.6
#4 0x00007fb86bc8f8ff in __pthread_tpp_change_priority () from /lib64/libpthread.so.0
#5 0x00007fb86bc865ec in __pthread_mutex_lock_full () from /lib64/libpthread.so.0
#6 0x00007fb86d11429a in inode_ctx_get0 (inode=inode@entry=0x7fb81c030a38, xlator=xlator@entry=0x7fb83805efe0, value1=value1@entry=0x7fb85c0a5830) at inode.c:2261
#7 0x00007fb86d114337 in inode_needs_lookup (inode=0x7fb81c030a38, this=0x7fb83805efe0) at inode.c:2061
#8 0x00007fb864798d18 in __glfs_resolve_inode (fs=fs@entry=0x7fb838041600, subvol=subvol@entry=0x7fb8140242f0, object=object@entry=0x7fb83807a600) at glfs-resolve.c:1107
#9 0x00007fb864798ec7 in glfs_resolve_inode (fs=fs@entry=0x7fb838041600, subvol=subvol@entry=0x7fb8140242f0, object=object@entry=0x7fb83807a600) at glfs-resolve.c:1133
#10 0x00007fb86479936d in pub_glfs_h_lookupat (fs=fs@entry=0x7fb838041600, parent=0x7fb83807a600, path=0x7fb828008e60 'hatest.0.18', stat=stat@entry=0x7fb85c0a5a70, follow=follow@entry=0) at glfs-handleops.c:98
#11 0x00007fb8649b00b3 in svs_lookup_entry (this=this@entry=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, buf=buf@entry=0x7fb85c0a5c10, postparent=postparent@entry=0x7fb85c0a5b70, parent=parent@entry=0x7fb7ec0027c8,
parent_ctx=0x7fb838003820, op_errno=0x7fb85c0a5d9c) at snapview-server.c:364
#12 0x00007fb8649b33cd in svs_get_handle (this=this@entry=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, inode_ctx=inode_ctx@entry=0x7fb83807a2a0, op_errno=op_errno@entry=0x7fb85c0a5d9c) at snapview-server.c:1944
#13 0x00007fb8649b62ad in svs_stat (frame=frame@entry=0x7fb838060048, this=0x7fb86000aba0, loc=loc@entry=0x7fb82803c7e0, xdata=xdata@entry=0x0) at snapview-server.c:2003
#14 0x00007fb86d1aa47a in default_stat_resume (frame=0x7fb8280016f8, this=0x7fb86000ed50, loc=0x7fb82803c7e0, xdata=0x0) at defaults.c:2205
#15 0x00007fb86d125dd5 in call_resume (stub=0x7fb82803c798) at call-stub.c:2392
#16 0x00007fb85f9c3878 in iot_worker (data=0x7fb8600265d0) at io-threads.c:232
#17 0x00007fb86bc8415a in start_thread () from /lib64/libpthread.so.0
#18 0x00007fb86b4cddd3 in clone () from /lib64/libc.so.6
(gdb) f 6
#6 0x00007fb86d11429a in inode_ctx_get0 (inode=inode@entry=0x7fb81c030a38, xlator=xlator@entry=0x7fb83805efe0, value1=value1@entry=0x7fb85c0a5830) at inode.c:2261
2261 LOCK(&inode->lock);
(gdb) p *inode
$1 = {table = 0x7fb81c02aa90, gfid = 'G\213\001', '\000' <repeats 12 times>, lock = {spinlock = 1640843969, mutex = {__data = {__lock = 1640843969, __count = 0, __owner = 475060, __nusers = 0, __kind = 1640843969, __spins = 0,
__elision = 0, __list = {__prev = 0x73fb4, __next = 0x7fb810004d40}}, __size = '\301J\315a\000\000\000\000\264?\a\000\000\000\000\000\301J\315a\000\000\000\000\264?\a\000\000\000\000\000@M\000\020\270\177\000',
__align = 1640843969}}, nlookup = {lk = 0x7fb81c030a78 '\320\354\001\034\270\177', value = 140428720598224}, fd_count = 469942784, active_fd_count = 32696, ref = 735, ia_type = IA_IFSOCK, fd_list = {next = 0x0,
prev = 0x7fb81c030a98}, dentry_list = {next = 0x7fb81c030a98, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0, in_invalidate_list = false, invalidate_sent = false, in_lru_list = false}
(gdb) p inode->gfid
$2 = 'G\213\001', '\000' <repeats 12 times>
(gdb) p /x inode->gfid
$3 = {0x47, 0x8b, 0x1, 0x0 <repeats 13 times>}

It crashed when attempting a lookup on one of the files (hatest.0.18). It looks like the inode is invalid since many of the fields seem whacky. For eg. 'inode->gfid' does not match either the parent (gfid=1, or the gfid of hatest.0.18 which I got from getfattr output, namely trusted.gfid=0x8f5a24dce15b466185a12b24d11c1d88) . The ia_type, fd_count etc. seems way off.

Additional info:
We are suspecting that when you deactivate the snapshot, all inodes on that snap are retired. But client does not know this and there could be an in-flight operation right at the time deactivate is in progress.

- The operating system / glusterfs version:
package-string: glusterfs 9.4

@amarts amarts added the Type:Bug label Jan 7, 2022
amarts added a commit to amarts/glusterfs_fork that referenced this issue Jan 7, 2022
`table->root` inode is agnostic to ref/unref as per the current inode
table implementation, but when in case of `snapd` process, the root inode
of snapshot process is mapped to another directory inode in global inode
table. Hence we need some 'extra' protection for this 'root' inode from
the snapshot process. Add an extra ref to the inode which goes through
glfs object initialization. This would prevent a possible mismanagement
of root inode during de-activate of snapshot.

Updates: gluster#3103
Change-Id: I12b9c85f677c2868ef112f36547eb69dc80d3b7b
Signed-off-by: Amar Tumballi <[email protected]>
xhernandez pushed a commit that referenced this issue Feb 16, 2022
`table->root` inode is agnostic to ref/unref as per the current inode
table implementation, but when in case of `snapd` process, the root inode
of snapshot process is mapped to another directory inode in global inode
table. Hence we need some 'extra' protection for this 'root' inode from
the snapshot process. Add an extra ref to the inode which goes through
glfs object initialization. This would prevent a possible mismanagement
of root inode during de-activate of snapshot.

Updates: #3103
Change-Id: I12b9c85f677c2868ef112f36547eb69dc80d3b7b
Signed-off-by: Amar Tumballi <[email protected]>
@stale
Copy link

stale bot commented Aug 11, 2022

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Aug 11, 2022
@stale
Copy link

stale bot commented Oct 1, 2022

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this as completed Oct 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type:Bug wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

1 participant