Core dumps on Gluster 9 - 3 replicas #2443

AlexNinaber · 2021-05-17T18:34:44Z

Description of problem:

A number of services use a 3 replicated gluster mount, rebooting one of the replicas always results in a core dump on the machine taking over the services. Once coredumped, the directory shows Socket not connected.

How to trigger: rebooting one of the replicas. Recreated the volume from scratch, still same problem. The services in HA might hold the mounted gluster for a reasonably long time and a smooth umount might not occur. However, in 3 replica this shouldn't matter.

gdb /usr/sbin/glusterfs -c core.3549

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/sbin/glusterfsd...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 3605]
[New LWP 3551]
[New LWP 3553]
[New LWP 3549]
[New LWP 3557]
[New LWP 3556]
[New LWP 3558]
[New LWP 3604]
[New LWP 3648]
[New LWP 3559]
[New LWP 3647]
[New LWP 3646]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=10.141.255.254 --volfi'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f27bb5a985b in __insert_and_merge () from /usr/lib64/glusterfs/9.2/xlator/protocol/client.so
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-9.2-1.el7.x86_64

Mandatory info:
- The output of the gluster volume info command:

gluster volume info local

Volume Name: local
Type: Replicate
Volume ID: 04e9d8b5-2225-46c2-bcd2-78356e0581f1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.141.255.254:/gluster/local
Brick2: 10.141.255.253:/gluster/local
Brick3: 10.141.11.1:/glusterssd/local
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
performance.write-behind: off
performance.flush-behind: off
cluster.granular-entry-heal: enable

(core dump error is also there without the granular)

- The output of the gluster volume status command:

gluster volume status local
Status of volume: local
Gluster process TCP Port RDMA Port Online Pid

Brick 10.141.255.254:/gluster/local 49154 0 Y 2680
Brick 10.141.255.253:/gluster/local 49154 0 Y 2757
Brick 10.141.11.1:/glusterssd/local 49154 0 Y 6150
Self-heal Daemon on localhost N/A N/A Y 2693
Self-heal Daemon on 10.141.11.1 N/A N/A Y 6196
Self-heal Daemon on 10.141.255.253 N/A N/A Y 2770

Task Status of Volume local

There are no active volume tasks

- The output of the gluster volume heal command:

gluster volume heal local
Launching heal operation to perform index self heal on volume local has been successful
Use heal info commands to check status.

Will not solve it

gluster volume heal local info
Brick 10.141.255.254:/gluster/local
Status: Connected
Number of entries: 0

Brick 10.141.255.253:/gluster/local
Status: Connected
Number of entries: 0

Brick 10.141.11.1:/glusterssd/local
Status: Connected
Number of entries: 0

Socket still disconnected

**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/

The message "W [MSGID: 114061] [client-common.c:2895:client_pre_lk_v2] 0-local-client-1: remote_fd is -1. EBADFD [{gfid=c8963045-a4b5-4dd6-b794-7ea4acb6614d}, {errno=77}, {error=File descriptor in
bad state}]" repeated 37 times between [2021-05-17 16:57:50.016590 +0000] and [2021-05-17 16:57:50.106029 +0000]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LK)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-05-17 16:57:50 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 9.2
/lib64/libglusterfs.so.0(+0x28d2f)[0x7f9b7b6dcd2f]
/lib64/libglusterfs.so.0(gf_print_trace+0x36a)[0x7f9b7b6e7dba]
/lib64/libc.so.6(+0x36400)[0x7f9b79912400]
/usr/lib64/glusterfs/9.2/xlator/protocol/client.so(+0x3b85b)[0x7f9b6c52d85b]
/usr/lib64/glusterfs/9.2/xlator/protocol/client.so(+0x3cc00)[0x7f9b6c52ec00]
/usr/lib64/glusterfs/9.2/xlator/protocol/client.so(+0x5903d)[0x7f9b6c54b03d]
/lib64/libgfrpc.so.0(+0xf7f1)[0x7f9b7b4867f1]
/lib64/libgfrpc.so.0(+0xfb65)[0x7f9b7b486b65]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f9b7b483133]
/usr/lib64/glusterfs/9.2/rpc-transport/socket.so(+0x4418)[0x7f9b6f23b418]
/usr/lib64/glusterfs/9.2/rpc-transport/socket.so(+0x9d21)[0x7f9b6f240d21]
/lib64/libglusterfs.so.0(+0x8e13c)[0x7f9b7b74213c]
/lib64/libpthread.so.0(+0x7ea5)[0x7f9b7a114ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f9b799da9fd]

**- Is there any crash ? Provide the backtrace and coredump

I don't see the debug rpm in the repo?

Additional info:

- The operating system / glusterfs version:

9.2 rpm repo,

The text was updated successfully, but these errors were encountered:

AlexNinaber · 2021-05-17T20:00:56Z

#0 list_del_init (old=0x7f81000dff68) at ../../../../libglusterfs/src/glusterfs/list.h:82
No locals.
#1 __delete_client_lock (lock=0x7f81000df720) at client-lk.c:128
No locals.
#2 __insert_and_merge (fdctx=fdctx@entry=0x7f8100089ec0, lock=) at client-lk.c:272
conf =
t =
sum = 0x7f81000ce3c0
i = 0
v = {locks = {0x7f81000c7c10, 0x0, 0x0}}
#3 0x00007f81042e2c00 in client_setlk (lock=, fdctx=0x7f8100089ec0) at client-lk.c:311
No locals.
#4 client_add_lock_for_recovery (fd=0x7f80e805d9b8, flock=flock@entry=0x7f8104f358f0, owner=owner@entry=0x7f80e8071768, cmd=6) at client-lk.c:494
fdctx = 0x7f8100089ec0
this = 0x7f8100007c30
lock =
conf = 0x7f8100042c70
ret = 0
FUNCTION = "client_add_lock_for_recovery"
#5 0x00007f81042ff03d in client4_0_lk_cbk (req=0x7f80e8061cd8, iov=, count=, myframe=0x7f80e8012478) at client-rpc-fops_v2.c:2208
frame = 0x7f80e8012478
lock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 8, data = "U\016n\325\327\033\311\313", '\000' <repeats 1015 times>}}
rsp = {op_ret = 0, op_errno = 0, xdata = {xdr_size = 0, count = -1, pairs = {pairs_len = 0, pairs_val = 0x0}}, flock = {type = 0, whence = 0, start = 1073741824, len = 2, pid = 0,
lk_owner = {lk_owner_len = 8, lk_owner_val = 0x7f81000d5060 "U\016n\325\327\033\311\313nt"}}}
ret =
this = 0x7f8100007c30
xdata = 0x0
local = 0x7f80e80716b8
FUNCTION = "client4_0_lk_cbk"
#6 0x00007f81138397f1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f8100042df0, pollin=pollin@entry=0x7f81000ca5b0) at rpc-clnt.c:759
conn = 0x7f8100042e20
saved_frame =
ret = 0
req = 0x7f80e8061cd8
xid = 39707
FUNCTION = "rpc_clnt_handle_reply"
#7 0x00007f8113839b65 in rpc_clnt_notify (trans=0x7f81000430f0, mydata=0x7f8100042e20, event=, data=0x7f81000ca5b0) at rpc-clnt.c:926
conn = 0x7f8100042e20
clnt = 0x7f8100042df0
ret = -1
req_info = 0x0
pollin = 0x7f81000ca5b0
clnt_mydata = 0x0
old_THIS = 0x7f8100007c30
FUNCTION = "rpc_clnt_notify"
#8 0x00007f8113836133 in rpc_transport_notify (this=this@entry=0x7f81000430f0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f81000ca5b0) at rpc-transport.c:520
ret = -1
FUNCTION = "rpc_transport_notify"
#9 0x00007f81069ee418 in socket_event_poll_in_async (xl=, async=async@entry=0x7f81000ca6c8) at socket.c:2502
pollin = 0x7f81000ca5b0
this = 0x7f81000430f0
priv = 0x7f8100043740
#10 0x00007f81069f3d21 in gf_async (cbk=0x7f81069ee3f0 <socket_event_poll_in_async>, xl=, async=0x7f81000ca6c8) at ../../../../libglusterfs/src/glusterfs/async.h:189
No locals.
#11 socket_event_poll_in (notify_handled=true, this=0x7f81000430f0) at socket.c:2543
ret =
pollin = 0x7f81000ca5b0
priv = 0x7f8100043740
ctx =
#12 socket_event_handler (fd=, idx=3, gen=, data=0x7f81000430f0, poll_in=, poll_out=, poll_err=0, event_thread_died=0 '\000')
at socket.c:2934
ret =
ctx =
notify_handled =
priv = 0x7f8100043740
socket_closed =
poll_out =
poll_in =
data = 0x7f81000430f0
idx = 3
fd =
event_thread_died = 0 '\000'
poll_err =
gen =
this =
#13 0x00007f8113af513c in event_dispatch_epoll_handler (event=0x7f8104f360f0, event_pool=0x563f8db68520) at event-epoll.c:640
handler = 0x7f81069f2be0 <socket_event_handler>
gen = 1
slot = 0x563f8dbadec0
data = 0x7f81000430f0
ret = 0
fd = 16

pranithk · 2021-05-18T00:42:37Z

@AlexNinaber Let me try to recreate this issue and get back to you.

pranithk · 2021-05-18T01:13:12Z

@AlexNinaber What is the application you are running on the mount when you hit the issue. Locking pattern is important to recreate the issue. It would be great to have this info too before I try to recreate the issue.

AlexNinaber · 2021-05-19T13:39:53Z

@pranithk it's not easy to give 1,2,3 step plan, there are multiple services running from Gluster: mongo, dhcp, slurm. I've added a longer wait before failover (i.e until the node can't ping anymore), it's not core dumping. However, what remains ever since using version 9 is the time for files to be repaired, if at all. Sometimes I have to restart the services as otherwise it just doesn't seem to be healed. Size is relatively small of this volume, 100M or so. Starting from a healthy volume 3 replica, rebooting one replica I get this:

[root@master02 ~]# gluster volume heal local info
Brick 10.141.255.254:/gluster/local
/var/spool/slurm/priority_last_decay_ran
/var/spool/slurm
Status: Connected
Number of entries: 2

Brick 10.141.255.253:/gluster/local
/var/lib/mongodb/journal
/var/spool/slurm/priority_last_decay_ran
/var/lib/dhcpd/dhcpd.leases
/var/spool/slurm
/var/lib/dhcpd
Status: Connected
Number of entries: 5

Brick 10.141.11.1:/glusterssd/local
/var/lib/dhcpd
/var/lib/mongodb/journal
/var/lib/dhcpd/dhcpd.leases
/var/spool/slurm
Status: Connected
Number of entries: 4

It doesn't go down.

And I see a lot of this on all 3 bricks:

[2021-05-19 13:29:18.080718 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 2-local-replicate-0: performing entry selfheal on 872ebcb8-5b86-4dc5-aac6-7bdd016a186f
[2021-05-19 13:29:18.086459 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.086542 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.086809 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.090666 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.090837 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.092508 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:19.094165 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:19.094256 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:19.094342 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]

I've added more strict quorum settings and other stuff but that doesn't seem to help:

Volume Name: local
Type: Replicate
Volume ID: 04e9d8b5-2225-46c2-bcd2-78356e0581f1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.141.255.254:/gluster/local
Brick2: 10.141.255.253:/gluster/local
Brick3: 10.141.11.1:/glusterssd/local
Options Reconfigured:
performance.strict-o-direct: on
performance.open-behind: off
performance.quick-read: off
performance.stat-prefetch: off
performance.flush-behind: off
performance.write-behind: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.granular-entry-heal: enable
cluster.server-quorum-type: server
cluster.quorum-type: fixed
cluster.quorum-count: 2
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.entry-self-heal: on
cluster.data-self-heal-algorithm: diff

AlexNinaber · 2021-05-19T21:36:48Z

Dear @pranithk, again core dump, but now without rebooting anything in the 3x replica; after 2 hours fuse mount entirely gone.

#0 list_del_init (old=0x7fb38418af58) at ../../../../libglusterfs/src/glusterfs/list.h:82
82 old->prev->next = old->next;
Missing separate debuginfos, use: debuginfo-install userspace-rcu-0.10.0-3.el7.x86_64
(gdb) full bt
Undefined command: "full". Try "help".
(gdb) bt full
#0 list_del_init (old=0x7fb38418af58) at ../../../../libglusterfs/src/glusterfs/list.h:82
No locals.
#1 __delete_client_lock (lock=0x7fb38418a710) at client-lk.c:128
No locals.
#2 __insert_and_merge (fdctx=fdctx@entry=0x7fb38c050200, lock=) at client-lk.c:272
conf =
t =
sum = 0x7fb38418b870
i = 0
v = {locks = {0x7fb38418c9d0, 0x0, 0x0}}
#3 0x00007fb3911dec00 in client_setlk (lock=, fdctx=0x7fb38c050200) at client-lk.c:311
No locals.
#4 client_add_lock_for_recovery (fd=0x7fb3780416d8, flock=flock@entry=0x7fb391c318f0, owner=owner@entry=0x7fb378081f58, cmd=6) at client-lk.c:494
fdctx = 0x7fb38c050200
this = 0x7fb38c007bd0
lock =
conf = 0x7fb38c036c40
ret = 0
FUNCTION = "client_add_lock_for_recovery"
#5 0x00007fb3911fb03d in client4_0_lk_cbk (req=0x7fb378039638, iov=, count=, myframe=0x7fb378056808) at client-rpc-fops_v2.c:2208
frame = 0x7fb378056808
lock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 8, data = "\213*\246\234"\245\211\356", '\000' <repeats 1015 times>}}
rsp = {op_ret = 0, op_errno = 0, xdata = {xdr_size = 0, count = -1, pairs = {pairs_len = 0, pairs_val = 0x0}}, flock = {type = 0, whence = 0, start = 1073741824, len = 2,
pid = 0, lk_owner = {lk_owner_len = 8, lk_owner_val = 0x7fb38400d270 "\213*\246\234"\245\211\356nt"}}}
ret =
this = 0x7fb38c007bd0
xdata = 0x0
local = 0x7fb378081ea8
FUNCTION = "client4_0_lk_cbk"
#6 0x00007fb3a01367f1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fb38c036dc0, pollin=pollin@entry=0x7fb384173530) at rpc-clnt.c:759
conn = 0x7fb38c036df0
saved_frame =
ret = 0
req = 0x7fb378039638
xid = 115975
FUNCTION = "rpc_clnt_handle_reply"
#7 0x00007fb3a0136b65 in rpc_clnt_notify (trans=0x7fb38c0370c0, mydata=0x7fb38c036df0, event=, data=0x7fb384173530) at rpc-clnt.c:926
conn = 0x7fb38c036df0
---Type to continue, or q to quit---
clnt = 0x7fb38c036dc0
ret = -1
req_info = 0x0
pollin = 0x7fb384173530
clnt_mydata = 0x0
old_THIS = 0x7fb38c007bd0
FUNCTION = "rpc_clnt_notify"
#8 0x00007fb3a0133133 in rpc_transport_notify (this=this@entry=0x7fb38c0370c0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fb384173530) at rpc-transport.c:520
ret = -1
FUNCTION = "rpc_transport_notify"
#9 0x00007fb393eeb418 in socket_event_poll_in_async (xl=, async=async@entry=0x7fb384173648) at socket.c:2502
pollin = 0x7fb384173530
this = 0x7fb38c0370c0
priv = 0x7fb38c037710
#10 0x00007fb393ef0d21 in gf_async (cbk=0x7fb393eeb3f0 <socket_event_poll_in_async>, xl=, async=0x7fb384173648) at ../../../../libglusterfs/src/glusterfs/async.h:189
No locals.
#11 socket_event_poll_in (notify_handled=true, this=0x7fb38c0370c0) at socket.c:2543
ret =
pollin = 0x7fb384173530
priv = 0x7fb38c037710
ctx =
#12 socket_event_handler (fd=, idx=3, gen=, data=0x7fb38c0370c0, poll_in=, poll_out=, poll_err=0,
event_thread_died=0 '\000') at socket.c:2934
ret =
ctx =
notify_handled =
priv = 0x7fb38c037710
socket_closed =
poll_out =
poll_in =
data = 0x7fb38c0370c0
idx = 3
fd =
event_thread_died = 0 '\000'
poll_err =
gen =
this =
#13 0x00007fb3a03f213c in event_dispatch_epoll_handler (event=0x7fb391c320f0, event_pool=0x5640fb8cf520) at event-epoll.c:640
---Type to continue, or q to quit---
handler = 0x7fb393eefbe0 <socket_event_handler>
gen = 1
slot = 0x5640fb914ec0
data = 0x7fb38c0370c0
ret = 0
fd = 16
ev_data = 0x7fb391c320f4
idx = 3
handled_error_previously = false
#14 event_dispatch_epoll_worker (data=0x5640fb9310b0) at event-epoll.c:751
event = {events = 1, data = {ptr = 0x100000003, fd = 3, u32 = 3, u64 = 4294967299}}
ret =
ev_data = 0x5640fb9310b0
event_pool = 0x5640fb8cf520
myindex = 2
timetodie = 0
gen =
poller_death_notify = {next = 0x0, prev = 0x0}
slot = 0x0
tmp = 0x0
FUNCTION = "event_dispatch_epoll_worker"
#15 0x00007fb39edc4ea5 in start_thread (arg=0x7fb391c33700) at pthread_create.c:307
__res =
pd = 0x7fb391c33700
now =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140409221363456, 5019503098971376158, 0, 8392704, 0, 140409221363456, -4985492487940781538, -4985512542518162914},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
pagesize_m1 =
sp =
freesize =
#16 0x00007fb39e68a9fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.

pranithk · 2021-05-20T01:41:27Z

@AlexNinaber Could you do
p *lock in any of the frames where this variable is available?
Maybe the lock->next/lock->prev is NULL.

pranithk · 2021-05-20T01:46:27Z

@AlexNinaber I found one place where it could be NULL. Will it be possible for you to test the patch to see if this is the only place where the issue is present?

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk · 2021-05-20T01:55:14Z

@AlexNinaber #2457 Could you test this issue with the RPMs generated and let us know if this fixes the issue?

AlexNinaber · 2021-05-20T10:21:21Z

@pranithk
(gdb) p *lock
Cannot access memory at address 0x0

Happy to try the rpm, where can I find it?

pranithk · 2021-05-20T10:30:55Z

@pranithk
(gdb) p *lock
Cannot access memory at address 0x0

Happy to try the rpm, where can I find it?

@AlexNinaber Which frame did you try it in? Could you do this in frame-1

AlexNinaber · 2021-05-20T11:40:39Z

@pranithk this is gdb on the core, lock is optimized out so not immediately clear to me if putting a break in would help really.

pranithk · 2021-05-20T11:46:12Z

@pranithk this is gdb on the core, lock is optimized out so not immediately clear to me if putting a break in would help really.

You don't need to put a break point. In gdb: Do:
(gdb) fr 1
(gdb) p *lock

If you are not on slack, can you join using the slack invite in https://www.gluster.org/community/?

AlexNinaber · 2021-05-20T11:47:53Z

(gdb) p lock
$1 = {fd = 0x7fb3780416d8, user_flock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}},
fl_start = 1073741824, fl_end = 1073741825, fl_type = 0, cmd = 0, owner = {len = 8, data = "\213\246\234"\245\211\356", '\000' <repeats 1015 times>}, list = {next = 0x0,
prev = 0x0}}

I'll get slack

pranithk · 2021-05-20T11:54:19Z

(gdb) p lock $1 = {fd = 0x7fb3780416d8, user_flock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, fl_start = 1073741824, fl_end = 1073741825, fl_type = 0, cmd = 0, owner = {len = 8, data = "\213\246\234"\245\211\356", '\000' <repeats 1015 times>}, list = {next = 0x0,
prev = 0x0}}

I'll get slack

Cool, this confirms the theory.
https://build.gluster.org/job/gh_devrpm-el7/1563/ -- el7
https://build.gluster.org/job/gh_devrpm-fedora/1530/ -- fedora

pranithk · 2021-05-20T11:55:09Z

(gdb) p lock $1 = {fd = 0x7fb3780416d8, user_flock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, fl_start = 1073741824, fl_end = 1073741825, fl_type = 0, cmd = 0, owner = {len = 8, data = "\213\246\234"\245\211\356", '\000' <repeats 1015 times>}, list = {next = 0x0,
prev = 0x0}}
I'll get slack

Cool, this confirms the theory.
https://build.gluster.org/job/gh_devrpm-el7/1563/ -- el7
https://build.gluster.org/job/gh_devrpm-fedora/1530/ -- fedora

Check under Build Artifacts

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

fixes: #2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

fixes: #2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

> Upstream patch: gluster@00761df > fixes: gluster#2443 > Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 > Signed-off-by: Pranith Kumar K <[email protected]> BUG: 1689375 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: karthik-us <[email protected]> Reviewed-on: https://code.engineering.redhat.com/gerrit/c/rhs-glusterfs/+/245613 Tested-by: RHGS Build Bot <[email protected]> Reviewed-by: Ravishankar Narayanankutty <[email protected]> Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <[email protected]>

pranithk self-assigned this May 18, 2021

pranithk mentioned this issue May 20, 2021

protocol/client: Initialize list head to prevent NULL de-reference #2456

Merged

pranithk added a commit to pranithk/glusterfs that referenced this issue May 20, 2021

protocol/client: Initialize list head to prevent NULL de-reference

5bf14ca

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk mentioned this issue May 20, 2021

protocol/client: Initialize list head to prevent NULL de-reference #2457

Merged

pranithk added a commit to pranithk/glusterfs that referenced this issue May 22, 2021

protocol/client: Initialize list head to prevent NULL de-reference

b3a5d0b

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk added a commit to pranithk/glusterfs that referenced this issue May 22, 2021

protocol/client: Initialize list head to prevent NULL de-reference

0fd3d43

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk added a commit to pranithk/glusterfs that referenced this issue May 23, 2021

protocol/client: Initialize list head to prevent NULL de-reference

00761df

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

xhernandez closed this as completed in #2456 May 25, 2021

xhernandez pushed a commit that referenced this issue May 25, 2021

protocol/client: Initialize list head to prevent NULL de-reference

b666f60

fixes: #2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk added a commit to pranithk/glusterfs that referenced this issue May 25, 2021

protocol/client: Initialize list head to prevent NULL de-reference

be13186

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk added a commit to pranithk/glusterfs that referenced this issue May 25, 2021

protocol/client: Initialize list head to prevent NULL de-reference

fb73751

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk mentioned this issue May 25, 2021

protocol/client: Initialize list head to prevent NULL de-reference #2477

Merged

amarts pushed a commit to kadalu/glusterfs that referenced this issue May 25, 2021

protocol/client: Initialize list head to prevent NULL de-reference

5f5a0bc

fixes: gluster#2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk added release 8 release 8 release 9 release 9 labels May 26, 2021

xhernandez pushed a commit that referenced this issue May 27, 2021

protocol/client: Initialize list head to prevent NULL de-reference

92fcb9f

fixes: #2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

xhernandez pushed a commit that referenced this issue May 27, 2021

protocol/client: Initialize list head to prevent NULL de-reference

2f01eb5

fixes: #2443 Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564 Signed-off-by: Pranith Kumar K <[email protected]>

pranithk mentioned this issue Jun 23, 2021

Crash #2563

Closed

pranithk mentioned this issue Aug 21, 2021

Glusterfs (fuse) client crashes #2722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core dumps on Gluster 9 - 3 replicas #2443

Core dumps on Gluster 9 - 3 replicas #2443

AlexNinaber commented May 17, 2021 •

edited

Loading

AlexNinaber commented May 17, 2021 •

edited

Loading

pranithk commented May 18, 2021

pranithk commented May 18, 2021

AlexNinaber commented May 19, 2021

AlexNinaber commented May 19, 2021

pranithk commented May 20, 2021

pranithk commented May 20, 2021

pranithk commented May 20, 2021

AlexNinaber commented May 20, 2021

pranithk commented May 20, 2021

AlexNinaber commented May 20, 2021

pranithk commented May 20, 2021

AlexNinaber commented May 20, 2021

pranithk commented May 20, 2021

pranithk commented May 20, 2021

Core dumps on Gluster 9 - 3 replicas #2443

Core dumps on Gluster 9 - 3 replicas #2443

Comments

AlexNinaber commented May 17, 2021 • edited Loading

gdb /usr/sbin/glusterfs -c core.3549

gluster volume status local Status of volume: local Gluster process TCP Port RDMA Port Online Pid

Task Status of Volume local

AlexNinaber commented May 17, 2021 • edited Loading

pranithk commented May 18, 2021

pranithk commented May 18, 2021

AlexNinaber commented May 19, 2021

AlexNinaber commented May 19, 2021

pranithk commented May 20, 2021

pranithk commented May 20, 2021

pranithk commented May 20, 2021

AlexNinaber commented May 20, 2021

pranithk commented May 20, 2021

AlexNinaber commented May 20, 2021

pranithk commented May 20, 2021

AlexNinaber commented May 20, 2021

pranithk commented May 20, 2021

pranithk commented May 20, 2021

AlexNinaber commented May 17, 2021 •

edited

Loading

gluster volume status local
Status of volume: local
Gluster process TCP Port RDMA Port Online Pid

AlexNinaber commented May 17, 2021 •

edited

Loading