Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core dumps on Gluster 9 - 3 replicas #2443

Closed
AlexNinaber opened this issue May 17, 2021 · 15 comments · Fixed by #2456
Closed

Core dumps on Gluster 9 - 3 replicas #2443

AlexNinaber opened this issue May 17, 2021 · 15 comments · Fixed by #2456
Assignees
Labels
release 8 release 8 release 9 release 9

Comments

@AlexNinaber
Copy link

AlexNinaber commented May 17, 2021

Description of problem:

A number of services use a 3 replicated gluster mount, rebooting one of the replicas always results in a core dump on the machine taking over the services. Once coredumped, the directory shows Socket not connected.

How to trigger: rebooting one of the replicas. Recreated the volume from scratch, still same problem. The services in HA might hold the mounted gluster for a reasonably long time and a smooth umount might not occur. However, in 3 replica this shouldn't matter.

gdb /usr/sbin/glusterfs -c core.3549

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/sbin/glusterfsd...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 3605]
[New LWP 3551]
[New LWP 3553]
[New LWP 3549]
[New LWP 3557]
[New LWP 3556]
[New LWP 3558]
[New LWP 3604]
[New LWP 3648]
[New LWP 3559]
[New LWP 3647]
[New LWP 3646]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=10.141.255.254 --volfi'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f27bb5a985b in __insert_and_merge () from /usr/lib64/glusterfs/9.2/xlator/protocol/client.so
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-9.2-1.el7.x86_64

Mandatory info:
- The output of the gluster volume info command:

gluster volume info local

Volume Name: local
Type: Replicate
Volume ID: 04e9d8b5-2225-46c2-bcd2-78356e0581f1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.141.255.254:/gluster/local
Brick2: 10.141.255.253:/gluster/local
Brick3: 10.141.11.1:/glusterssd/local
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
performance.write-behind: off
performance.flush-behind: off
cluster.granular-entry-heal: enable

(core dump error is also there without the granular)

- The output of the gluster volume status command:

gluster volume status local
Status of volume: local
Gluster process TCP Port RDMA Port Online Pid

Brick 10.141.255.254:/gluster/local 49154 0 Y 2680
Brick 10.141.255.253:/gluster/local 49154 0 Y 2757
Brick 10.141.11.1:/glusterssd/local 49154 0 Y 6150
Self-heal Daemon on localhost N/A N/A Y 2693
Self-heal Daemon on 10.141.11.1 N/A N/A Y 6196
Self-heal Daemon on 10.141.255.253 N/A N/A Y 2770

Task Status of Volume local

There are no active volume tasks

- The output of the gluster volume heal command:

gluster volume heal local
Launching heal operation to perform index self heal on volume local has been successful
Use heal info commands to check status.

Will not solve it

gluster volume heal local info
Brick 10.141.255.254:/gluster/local
Status: Connected
Number of entries: 0

Brick 10.141.255.253:/gluster/local
Status: Connected
Number of entries: 0

Brick 10.141.11.1:/glusterssd/local
Status: Connected
Number of entries: 0

Socket still disconnected

**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/

The message "W [MSGID: 114061] [client-common.c:2895:client_pre_lk_v2] 0-local-client-1: remote_fd is -1. EBADFD [{gfid=c8963045-a4b5-4dd6-b794-7ea4acb6614d}, {errno=77}, {error=File descriptor in
bad state}]" repeated 37 times between [2021-05-17 16:57:50.016590 +0000] and [2021-05-17 16:57:50.106029 +0000]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LK)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-05-17 16:57:50 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 9.2
/lib64/libglusterfs.so.0(+0x28d2f)[0x7f9b7b6dcd2f]
/lib64/libglusterfs.so.0(gf_print_trace+0x36a)[0x7f9b7b6e7dba]
/lib64/libc.so.6(+0x36400)[0x7f9b79912400]
/usr/lib64/glusterfs/9.2/xlator/protocol/client.so(+0x3b85b)[0x7f9b6c52d85b]
/usr/lib64/glusterfs/9.2/xlator/protocol/client.so(+0x3cc00)[0x7f9b6c52ec00]
/usr/lib64/glusterfs/9.2/xlator/protocol/client.so(+0x5903d)[0x7f9b6c54b03d]
/lib64/libgfrpc.so.0(+0xf7f1)[0x7f9b7b4867f1]
/lib64/libgfrpc.so.0(+0xfb65)[0x7f9b7b486b65]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f9b7b483133]
/usr/lib64/glusterfs/9.2/rpc-transport/socket.so(+0x4418)[0x7f9b6f23b418]
/usr/lib64/glusterfs/9.2/rpc-transport/socket.so(+0x9d21)[0x7f9b6f240d21]
/lib64/libglusterfs.so.0(+0x8e13c)[0x7f9b7b74213c]
/lib64/libpthread.so.0(+0x7ea5)[0x7f9b7a114ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f9b799da9fd]

**- Is there any crash ? Provide the backtrace and coredump

I don't see the debug rpm in the repo?

Additional info:

- The operating system / glusterfs version:

9.2 rpm repo,

@AlexNinaber
Copy link
Author

AlexNinaber commented May 17, 2021

#0 list_del_init (old=0x7f81000dff68) at ../../../../libglusterfs/src/glusterfs/list.h:82
No locals.
#1 __delete_client_lock (lock=0x7f81000df720) at client-lk.c:128
No locals.
#2 __insert_and_merge (fdctx=fdctx@entry=0x7f8100089ec0, lock=) at client-lk.c:272
conf =
t =
sum = 0x7f81000ce3c0
i = 0
v = {locks = {0x7f81000c7c10, 0x0, 0x0}}
#3 0x00007f81042e2c00 in client_setlk (lock=, fdctx=0x7f8100089ec0) at client-lk.c:311
No locals.
#4 client_add_lock_for_recovery (fd=0x7f80e805d9b8, flock=flock@entry=0x7f8104f358f0, owner=owner@entry=0x7f80e8071768, cmd=6) at client-lk.c:494
fdctx = 0x7f8100089ec0
this = 0x7f8100007c30
lock =
conf = 0x7f8100042c70
ret = 0
FUNCTION = "client_add_lock_for_recovery"
#5 0x00007f81042ff03d in client4_0_lk_cbk (req=0x7f80e8061cd8, iov=, count=, myframe=0x7f80e8012478) at client-rpc-fops_v2.c:2208
frame = 0x7f80e8012478
lock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 8, data = "U\016n\325\327\033\311\313", '\000' <repeats 1015 times>}}
rsp = {op_ret = 0, op_errno = 0, xdata = {xdr_size = 0, count = -1, pairs = {pairs_len = 0, pairs_val = 0x0}}, flock = {type = 0, whence = 0, start = 1073741824, len = 2, pid = 0,
lk_owner = {lk_owner_len = 8, lk_owner_val = 0x7f81000d5060 "U\016n\325\327\033\311\313nt"}}}
ret =
this = 0x7f8100007c30
xdata = 0x0
local = 0x7f80e80716b8
FUNCTION = "client4_0_lk_cbk"
#6 0x00007f81138397f1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f8100042df0, pollin=pollin@entry=0x7f81000ca5b0) at rpc-clnt.c:759
conn = 0x7f8100042e20
saved_frame =
ret = 0
req = 0x7f80e8061cd8
xid = 39707
FUNCTION = "rpc_clnt_handle_reply"
#7 0x00007f8113839b65 in rpc_clnt_notify (trans=0x7f81000430f0, mydata=0x7f8100042e20, event=, data=0x7f81000ca5b0) at rpc-clnt.c:926
conn = 0x7f8100042e20
clnt = 0x7f8100042df0
ret = -1
req_info = 0x0
pollin = 0x7f81000ca5b0
clnt_mydata = 0x0
old_THIS = 0x7f8100007c30
FUNCTION = "rpc_clnt_notify"
#8 0x00007f8113836133 in rpc_transport_notify (this=this@entry=0x7f81000430f0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f81000ca5b0) at rpc-transport.c:520
ret = -1
FUNCTION = "rpc_transport_notify"
#9 0x00007f81069ee418 in socket_event_poll_in_async (xl=, async=async@entry=0x7f81000ca6c8) at socket.c:2502
pollin = 0x7f81000ca5b0
this = 0x7f81000430f0
priv = 0x7f8100043740
#10 0x00007f81069f3d21 in gf_async (cbk=0x7f81069ee3f0 <socket_event_poll_in_async>, xl=, async=0x7f81000ca6c8) at ../../../../libglusterfs/src/glusterfs/async.h:189
No locals.
#11 socket_event_poll_in (notify_handled=true, this=0x7f81000430f0) at socket.c:2543
ret =
pollin = 0x7f81000ca5b0
priv = 0x7f8100043740
ctx =
#12 socket_event_handler (fd=, idx=3, gen=, data=0x7f81000430f0, poll_in=, poll_out=, poll_err=0, event_thread_died=0 '\000')
at socket.c:2934
ret =
ctx =
notify_handled =
priv = 0x7f8100043740
socket_closed =
poll_out =
poll_in =
data = 0x7f81000430f0
idx = 3
fd =
event_thread_died = 0 '\000'
poll_err =
gen =
this =
#13 0x00007f8113af513c in event_dispatch_epoll_handler (event=0x7f8104f360f0, event_pool=0x563f8db68520) at event-epoll.c:640
handler = 0x7f81069f2be0 <socket_event_handler>
gen = 1
slot = 0x563f8dbadec0
data = 0x7f81000430f0
ret = 0
fd = 16

@pranithk pranithk self-assigned this May 18, 2021
@pranithk
Copy link
Member

@AlexNinaber Let me try to recreate this issue and get back to you.

@pranithk
Copy link
Member

@AlexNinaber What is the application you are running on the mount when you hit the issue. Locking pattern is important to recreate the issue. It would be great to have this info too before I try to recreate the issue.

@AlexNinaber
Copy link
Author

@pranithk it's not easy to give 1,2,3 step plan, there are multiple services running from Gluster: mongo, dhcp, slurm. I've added a longer wait before failover (i.e until the node can't ping anymore), it's not core dumping. However, what remains ever since using version 9 is the time for files to be repaired, if at all. Sometimes I have to restart the services as otherwise it just doesn't seem to be healed. Size is relatively small of this volume, 100M or so. Starting from a healthy volume 3 replica, rebooting one replica I get this:

[root@master02 ~]# gluster volume heal local info
Brick 10.141.255.254:/gluster/local
/var/spool/slurm/priority_last_decay_ran
/var/spool/slurm
Status: Connected
Number of entries: 2

Brick 10.141.255.253:/gluster/local
/var/lib/mongodb/journal
/var/spool/slurm/priority_last_decay_ran
/var/lib/dhcpd/dhcpd.leases
/var/spool/slurm
/var/lib/dhcpd
Status: Connected
Number of entries: 5

Brick 10.141.11.1:/glusterssd/local
/var/lib/dhcpd
/var/lib/mongodb/journal
/var/lib/dhcpd/dhcpd.leases
/var/spool/slurm
Status: Connected
Number of entries: 4

It doesn't go down.

And I see a lot of this on all 3 bricks:

[2021-05-19 13:29:18.080718 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 2-local-replicate-0: performing entry selfheal on 872ebcb8-5b86-4dc5-aac6-7bdd016a186f
[2021-05-19 13:29:18.086459 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.086542 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.086809 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.090666 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.090837 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:18.092508 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:19.094165 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:19.094256 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-05-19 13:29:19.094342 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 2-local-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]

I've added more strict quorum settings and other stuff but that doesn't seem to help:

Volume Name: local
Type: Replicate
Volume ID: 04e9d8b5-2225-46c2-bcd2-78356e0581f1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.141.255.254:/gluster/local
Brick2: 10.141.255.253:/gluster/local
Brick3: 10.141.11.1:/glusterssd/local
Options Reconfigured:
performance.strict-o-direct: on
performance.open-behind: off
performance.quick-read: off
performance.stat-prefetch: off
performance.flush-behind: off
performance.write-behind: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.granular-entry-heal: enable
cluster.server-quorum-type: server
cluster.quorum-type: fixed
cluster.quorum-count: 2
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.entry-self-heal: on
cluster.data-self-heal-algorithm: diff

@AlexNinaber
Copy link
Author

Dear @pranithk, again core dump, but now without rebooting anything in the 3x replica; after 2 hours fuse mount entirely gone.

#0 list_del_init (old=0x7fb38418af58) at ../../../../libglusterfs/src/glusterfs/list.h:82
82 old->prev->next = old->next;
Missing separate debuginfos, use: debuginfo-install userspace-rcu-0.10.0-3.el7.x86_64
(gdb) full bt
Undefined command: "full". Try "help".
(gdb) bt full
#0 list_del_init (old=0x7fb38418af58) at ../../../../libglusterfs/src/glusterfs/list.h:82
No locals.
#1 __delete_client_lock (lock=0x7fb38418a710) at client-lk.c:128
No locals.
#2 __insert_and_merge (fdctx=fdctx@entry=0x7fb38c050200, lock=) at client-lk.c:272
conf =
t =
sum = 0x7fb38418b870
i = 0
v = {locks = {0x7fb38418c9d0, 0x0, 0x0}}
#3 0x00007fb3911dec00 in client_setlk (lock=, fdctx=0x7fb38c050200) at client-lk.c:311
No locals.
#4 client_add_lock_for_recovery (fd=0x7fb3780416d8, flock=flock@entry=0x7fb391c318f0, owner=owner@entry=0x7fb378081f58, cmd=6) at client-lk.c:494
fdctx = 0x7fb38c050200
this = 0x7fb38c007bd0
lock =
conf = 0x7fb38c036c40
ret = 0
FUNCTION = "client_add_lock_for_recovery"
#5 0x00007fb3911fb03d in client4_0_lk_cbk (req=0x7fb378039638, iov=, count=, myframe=0x7fb378056808) at client-rpc-fops_v2.c:2208
frame = 0x7fb378056808
lock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 8, data = "\213*\246\234"\245\211\356", '\000' <repeats 1015 times>}}
rsp = {op_ret = 0, op_errno = 0, xdata = {xdr_size = 0, count = -1, pairs = {pairs_len = 0, pairs_val = 0x0}}, flock = {type = 0, whence = 0, start = 1073741824, len = 2,
pid = 0, lk_owner = {lk_owner_len = 8, lk_owner_val = 0x7fb38400d270 "\213*\246\234"\245\211\356nt"}}}
ret =
this = 0x7fb38c007bd0
xdata = 0x0
local = 0x7fb378081ea8
FUNCTION = "client4_0_lk_cbk"
#6 0x00007fb3a01367f1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fb38c036dc0, pollin=pollin@entry=0x7fb384173530) at rpc-clnt.c:759
conn = 0x7fb38c036df0
saved_frame =
ret = 0
req = 0x7fb378039638
xid = 115975
FUNCTION = "rpc_clnt_handle_reply"
#7 0x00007fb3a0136b65 in rpc_clnt_notify (trans=0x7fb38c0370c0, mydata=0x7fb38c036df0, event=, data=0x7fb384173530) at rpc-clnt.c:926
conn = 0x7fb38c036df0
---Type to continue, or q to quit---
clnt = 0x7fb38c036dc0
ret = -1
req_info = 0x0
pollin = 0x7fb384173530
clnt_mydata = 0x0
old_THIS = 0x7fb38c007bd0
FUNCTION = "rpc_clnt_notify"
#8 0x00007fb3a0133133 in rpc_transport_notify (this=this@entry=0x7fb38c0370c0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fb384173530) at rpc-transport.c:520
ret = -1
FUNCTION = "rpc_transport_notify"
#9 0x00007fb393eeb418 in socket_event_poll_in_async (xl=, async=async@entry=0x7fb384173648) at socket.c:2502
pollin = 0x7fb384173530
this = 0x7fb38c0370c0
priv = 0x7fb38c037710
#10 0x00007fb393ef0d21 in gf_async (cbk=0x7fb393eeb3f0 <socket_event_poll_in_async>, xl=, async=0x7fb384173648) at ../../../../libglusterfs/src/glusterfs/async.h:189
No locals.
#11 socket_event_poll_in (notify_handled=true, this=0x7fb38c0370c0) at socket.c:2543
ret =
pollin = 0x7fb384173530
priv = 0x7fb38c037710
ctx =
#12 socket_event_handler (fd=, idx=3, gen=, data=0x7fb38c0370c0, poll_in=, poll_out=, poll_err=0,
event_thread_died=0 '\000') at socket.c:2934
ret =
ctx =
notify_handled =
priv = 0x7fb38c037710
socket_closed =
poll_out =
poll_in =
data = 0x7fb38c0370c0
idx = 3
fd =
event_thread_died = 0 '\000'
poll_err =
gen =
this =
#13 0x00007fb3a03f213c in event_dispatch_epoll_handler (event=0x7fb391c320f0, event_pool=0x5640fb8cf520) at event-epoll.c:640
---Type to continue, or q to quit---
handler = 0x7fb393eefbe0 <socket_event_handler>
gen = 1
slot = 0x5640fb914ec0
data = 0x7fb38c0370c0
ret = 0
fd = 16
ev_data = 0x7fb391c320f4
idx = 3
handled_error_previously = false
#14 event_dispatch_epoll_worker (data=0x5640fb9310b0) at event-epoll.c:751
event = {events = 1, data = {ptr = 0x100000003, fd = 3, u32 = 3, u64 = 4294967299}}
ret =
ev_data = 0x5640fb9310b0
event_pool = 0x5640fb8cf520
myindex = 2
timetodie = 0
gen =
poller_death_notify = {next = 0x0, prev = 0x0}
slot = 0x0
tmp = 0x0
FUNCTION = "event_dispatch_epoll_worker"
#15 0x00007fb39edc4ea5 in start_thread (arg=0x7fb391c33700) at pthread_create.c:307
__res =
pd = 0x7fb391c33700
now =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140409221363456, 5019503098971376158, 0, 8392704, 0, 140409221363456, -4985492487940781538, -4985512542518162914},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
pagesize_m1 =
sp =
freesize =
#16 0x00007fb39e68a9fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.

@pranithk
Copy link
Member

@AlexNinaber Could you do
p *lock in any of the frames where this variable is available?
Maybe the lock->next/lock->prev is NULL.

@pranithk
Copy link
Member

@AlexNinaber I found one place where it could be NULL. Will it be possible for you to test the patch to see if this is the only place where the issue is present?

pranithk added a commit to pranithk/glusterfs that referenced this issue May 20, 2021
fixes: gluster#2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
@pranithk
Copy link
Member

@AlexNinaber #2457 Could you test this issue with the RPMs generated and let us know if this fixes the issue?

@AlexNinaber
Copy link
Author

@pranithk
(gdb) p *lock
Cannot access memory at address 0x0

Happy to try the rpm, where can I find it?

@pranithk
Copy link
Member

@pranithk
(gdb) p *lock
Cannot access memory at address 0x0

Happy to try the rpm, where can I find it?

@AlexNinaber Which frame did you try it in? Could you do this in frame-1

@AlexNinaber
Copy link
Author

@pranithk this is gdb on the core, lock is optimized out so not immediately clear to me if putting a break in would help really.

@pranithk
Copy link
Member

@pranithk this is gdb on the core, lock is optimized out so not immediately clear to me if putting a break in would help really.

You don't need to put a break point. In gdb: Do:
(gdb) fr 1
(gdb) p *lock

If you are not on slack, can you join using the slack invite in https://www.gluster.org/community/?

@AlexNinaber
Copy link
Author

(gdb) p lock
$1 = {fd = 0x7fb3780416d8, user_flock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}},
fl_start = 1073741824, fl_end = 1073741825, fl_type = 0, cmd = 0, owner = {len = 8, data = "\213
\246\234"\245\211\356", '\000' <repeats 1015 times>}, list = {next = 0x0,
prev = 0x0}}

I'll get slack

@pranithk
Copy link
Member

(gdb) p lock $1 = {fd = 0x7fb3780416d8, user_flock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, fl_start = 1073741824, fl_end = 1073741825, fl_type = 0, cmd = 0, owner = {len = 8, data = "\213\246\234"\245\211\356", '\000' <repeats 1015 times>}, list = {next = 0x0,
prev = 0x0}}

I'll get slack

Cool, this confirms the theory.
https://build.gluster.org/job/gh_devrpm-el7/1563/ -- el7
https://build.gluster.org/job/gh_devrpm-fedora/1530/ -- fedora

@pranithk
Copy link
Member

(gdb) p lock $1 = {fd = 0x7fb3780416d8, user_flock = {l_type = 0, l_whence = 0, l_start = 1073741824, l_len = 2, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, fl_start = 1073741824, fl_end = 1073741825, fl_type = 0, cmd = 0, owner = {len = 8, data = "\213\246\234"\245\211\356", '\000' <repeats 1015 times>}, list = {next = 0x0,
prev = 0x0}}
I'll get slack

Cool, this confirms the theory.
https://build.gluster.org/job/gh_devrpm-el7/1563/ -- el7
https://build.gluster.org/job/gh_devrpm-fedora/1530/ -- fedora

Check under Build Artifacts

pranithk added a commit to pranithk/glusterfs that referenced this issue May 22, 2021
fixes: gluster#2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
pranithk added a commit to pranithk/glusterfs that referenced this issue May 22, 2021
fixes: gluster#2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
pranithk added a commit to pranithk/glusterfs that referenced this issue May 23, 2021
fixes: gluster#2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
xhernandez pushed a commit that referenced this issue May 25, 2021
fixes: #2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
pranithk added a commit to pranithk/glusterfs that referenced this issue May 25, 2021
fixes: gluster#2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
pranithk added a commit to pranithk/glusterfs that referenced this issue May 25, 2021
fixes: gluster#2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
amarts pushed a commit to kadalu/glusterfs that referenced this issue May 25, 2021
fixes: gluster#2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
@pranithk pranithk added release 8 release 8 release 9 release 9 labels May 26, 2021
xhernandez pushed a commit that referenced this issue May 27, 2021
fixes: #2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
xhernandez pushed a commit that referenced this issue May 27, 2021
fixes: #2443
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: Pranith Kumar K <[email protected]>
@pranithk pranithk mentioned this issue Jun 23, 2021
csabahenk pushed a commit to csabahenk/glusterfs that referenced this issue Mar 7, 2023
> Upstream patch: gluster@00761df
> fixes: gluster#2443
> Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
> Signed-off-by: Pranith Kumar K <[email protected]>

BUG: 1689375
Change-Id: I86ef0270d41d6fb924db97fde3196d7c98c8b564
Signed-off-by: karthik-us <[email protected]>
Reviewed-on: https://code.engineering.redhat.com/gerrit/c/rhs-glusterfs/+/245613
Tested-by: RHGS Build Bot <[email protected]>
Reviewed-by: Ravishankar Narayanankutty <[email protected]>
Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release 8 release 8 release 9 release 9
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants