Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC(10+2) volume, all nfs Server (glusterfs) crash. #1300

Closed
XiJinyu opened this issue Jun 12, 2020 · 2 comments
Closed

EC(10+2) volume, all nfs Server (glusterfs) crash. #1300

XiJinyu opened this issue Jun 12, 2020 · 2 comments
Labels
wontfix Managed by stale[bot]

Comments

@XiJinyu
Copy link
Contributor

XiJinyu commented Jun 12, 2020

glusterfs version: 3.12.2-47

Description of problem:
EC volume on 24 nodes and mount it through NFS (glustefs). After running for a period of time, it is found that the NFS server of each node crashes completely.

The coredump of one of the nodes is as follows:

[root@~ ]# gdb -c /core.178940
......
[New LWP 179432]
......
[New LWP 179462]
warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for
Try: yum --enablerepo='debug' install /usr/lib/debug/.build-id/51/0c7de77b3fe41640337d1ba1e4085013092ab0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/run/gluster/n'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f5f88a44570 in uuid_unpack () from /lib64/libuuid.so.1
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libacl-2.2.51-14.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f5f88a44570 in uuid_unpack () from /lib64/libuuid.so.1
#1 0x00007f5f88a435c8 in uuid_compare () from /lib64/libuuid.so.1
#2 0x00007f5f892f3260 in gf_uuid_compare (u2=0x7f5f895b50f0 <root.13716> "", u1=u1@entry=0x8 <Address 0x8 out of bounds>) at compat-uuid.h:27
#3 __is_root_gfid (gfid=gfid@entry=0x8 <Address 0x8 out of bounds>) at inode.c:1008
#4 0x00007f5f7aed2556 in dht_revalidate_cbk (frame=0x7f5edc10a758, cookie=0x7f5f6c0d3420, this=0x7f5f6c0d69b0, op_ret=, op_errno=, inode=0x0, stbuf=0x0, xattr=0x0,
postparent=0x0) at dht-common.c:1898
#5 0x00007f5f89369610 in default_lookup_cbk (frame=0x7f5edc0dafe8, cookie=, this=, op_ret=-1, op_errno=2, inode=0x0, buf=0x0, xdata=0x0, postparent=0x0) at defaults.c:1265
#6 0x00007f5f7b14b806 in ec_manager_lookup (fop=0x7f5edc045a28, state=) at ec-generic.c:865
#7 0x00007f5f7b142a1b in __ec_manager (fop=0x7f5edc045a28, error=2) at ec-common.c:2750
#8 0x00007f5f7b142bf8 in ec_resume (fop=0x7f5edc045a28, error=0) at ec-common.c:488
#9 0x00007f5f7b142d2f in ec_complete (fop=0x7f5edc045a28) at ec-common.c:565
#10 0x00007f5f7b149ef2 in ec_lookup_cbk (frame=frame@entry=0x7f5edc1542e8, cookie=0x5, this=0x7f5f6c0d3420, op_ret=-1, op_errno=2, inode=inode@entry=0x7f5f6c759498, buf=buf@entry=0x7f5f796aa900, xdata=0x0,
postparent=postparent@entry=0x7f5f796aa970) at ec-generic.c:759
#11 0x00007f5f7b3d8f4d in client3_3_lookup_cbk (req=, iov=, count=, myframe=0x7f5edc0216d8) at client-rpc-fops.c:2872
#12 0x00007f5f89089bc0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f5f6c1a92b0, pollin=pollin@entry=0x7f5f6b8400f0) at rpc-clnt.c:778
#13 0x00007f5f89089f2b in rpc_clnt_notify (trans=, mydata=0x7f5f6c1a92e0, event=, data=0x7f5f6b8400f0) at rpc-clnt.c:971
#14 0x00007f5f89085d73 in rpc_transport_notify (this=this@entry=0x7f5f6c1a95e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f5f6b8400f0) at rpc-transport.c:557
#15 0x00007f5f7dea85e6 in socket_event_poll_in (this=this@entry=0x7f5f6c1a95e0, notify_handled=) at socket.c:2322
#16 0x00007f5f7deaac2a in socket_event_handler (fd=70, idx=62, gen=7, data=0x7f5f6c1a95e0, poll_in=, poll_out=, poll_err=0, event_thread_died=0 '\000') at socket.c:2482
#17 0x00007f5f89341cf0 in event_dispatch_epoll_handler (event=0x7f5f796aae70, event_pool=0x55b3313e6240) at event-epoll.c:643
#18 event_dispatch_epoll_worker (data=0x7f5f6c0f8200) at event-epoll.c:759
#19 0x00007f5f8811fdd5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f5f879e7ead in clone () from /lib64/libc.so.6

@stale
Copy link

stale bot commented Jan 8, 2021

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Jan 8, 2021
@stale
Copy link

stale bot commented Jan 23, 2021

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this as completed Jan 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

1 participant