Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gluster distributed volume #193

Closed
AlscadIngenierie opened this issue Apr 29, 2017 · 23 comments
Closed

Gluster distributed volume #193

AlscadIngenierie opened this issue Apr 29, 2017 · 23 comments
Labels
Question wontfix Managed by stale[bot]

Comments

@AlscadIngenierie
Copy link

AlscadIngenierie commented Apr 29, 2017

No description provided.

@superboum
Copy link

superboum commented Sep 16, 2018

I am running 3 C1 servers from scaleway, ARM architectures too and encountering exactly the same problem. My firewall is correctly configured, I have no DAC (SELinux, AppArmor). It works if I configure it in Distributed or Replicated but fails as described above with Dispersed.

@pranithk
Copy link
Member

@superboum Could you attach "gluster volume info " and "gluster volume status" output for this?

Including other developers who work on Disperse volume: @xhernandez @aspandey @sunilheggodu

@superboum
Copy link

superboum commented Sep 17, 2018

Create and start the volume:
root@lupine:/mnt# gluster volume create erasure disperse 3 redundancy 1 transport tcp 10.1.9.46:/var/lib/erasure 10.1.4.145:/var/lib/erasure 10.1.33.66:/var/lib/erasure
volume create: erasure: failed: The brick 10.1.33.66:/var/lib/erasure is being created in the root partition. It is recommended that you don't use the system's root partition for storage backend. Or use 'force' at the end of the command if you want to override this behavior.
root@lupine:/mnt# gluster volume create erasure disperse 3 redundancy 1 transport tcp 10.1.9.46:/var/lib/erasure 10.1.4.145:/var/lib/erasure 10.1.33.66:/var/lib/erasure force
volume create: erasure: success: please start the volume to access data
root@lupine:/mnt# gluster volume start erasure
volume start: erasure: success
Gluster Volume Info:
Volume Name: erasure
Type: Disperse
Volume ID: 3768bf54-07cf-43e0-9848-482863685c6d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.1.9.46:/var/lib/erasure
Brick2: 10.1.4.145:/var/lib/erasure
Brick3: 10.1.33.66:/var/lib/erasure
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
Gluster Volume Status:
root@lupine:/mnt# gluster volume status erasure
Status of volume: erasure
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.1.9.46:/var/lib/erasure            49153     0          Y       9193 
Brick 10.1.4.145:/var/lib/erasure           49153     0          Y       10531
Brick 10.1.33.66:/var/lib/erasure           49153     0          Y       26531
Self-heal Daemon on localhost               N/A       N/A        Y       26554
Self-heal Daemon on 10.1.9.46               N/A       N/A        Y       9216 
Self-heal Daemon on 10.1.4.145              N/A       N/A        Y       10554
 
Task Status of Volume erasure
------------------------------------------------------------------------------
There are no active volume tasks
Error:
root@lupine:/mnt# mount -t glusterfs 127.0.0.1:/erasure ./test/
root@lupine:/mnt# cd test/
root@lupine:/mnt/test# ls -lah
total 8.0K
drwxr-xr-x 3 root root 4.0K Sep 17 16:51 .
drwxr-xr-x 4 root root 4.0K Sep 17 16:41 ..
root@lupine:/mnt/test# touch hello
root@lupine:/mnt/test# ls -lah
total 8.0K
drwxr-xr-x 3 root root 4.0K Sep 17 16:51 .
drwxr-xr-x 4 root root 4.0K Sep 17 16:41 ..
-rw-r--r-- 1 root root    0 Sep 17 16:51 hello
root@lupine:/mnt/test# rm hello
root@lupine:/mnt/test# ls -lah
total 8.0K
drwxr-xr-x 3 root root 4.0K Sep 17 16:52 .
drwxr-xr-x 4 root root 4.0K Sep 17 16:41 ..
root@lupine:/mnt/test# echo world > hello
root@lupine:/mnt/test# ls -lah
ls: cannot open directory '.': Transport endpoint is not connected
root@lupine:/mnt/test# cat hello
cat: hello: Transport endpoint is not connected
root@lupine:/mnt/test# echo hey > hello
-bash: hello: Transport endpoint is not connected
Some info about my system:
root@lupine:/mnt/test# uname -a
Linux lupine 4.9.93-mainline-rev1 #1 SMP Tue Apr 10 09:42:40 UTC 2018 armv7l GNU/Linux
root@lupine:/mnt/test# lscpu
Architecture:        armv7l
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           Marvell
Model:               2
Model name:          PJ4B-MP
Stepping:            0x2
CPU max MHz:         1333.0000
CPU min MHz:         666.5000
BogoMIPS:            50.00
Flags:               half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae
root@lupine:/mnt/test# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux testing (buster)
Release:	testing
Codename:	buster
root@lupine:/mnt/test# glusterd --version
glusterfs 4.1.3
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
root@lupine:/mnt/test# 

No log in journalctl -u glusterd.

`/var/log/glusterfs/bricks/var-lib-erasure.log`:
[2018-09-17 16:46:36.706569] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 4.1.3 (args: /usr/sbin/glusterfsd -s 10.1.33.66 --volfile-id erasure.10.1.33.66.var-lib-erasure -p /var/run/gluster/vols/erasure/10.1.33.66-var-lib-erasure.pid -S /var/run/gluster/f69c35eb4a99a60c.socket --brick-name /var/lib/erasure -l /var/log/glusterfs/bricks/var-lib-erasure.log --xlator-option *-posix.glusterd-uuid=abe11c0a-ba2d-48e8-a989-67cbd31979dd --process-name brick --brick-port 49153 --xlator-option erasure-server.listen-port=49153)
[2018-09-17 16:46:36.737490] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-17 16:46:36.777276] I [rpcsvc.c:2494:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2018-09-17 16:46:36.779049] I [rpcsvc.c:2052:rpcsvc_spawn_threads] 0-rpc-service: spawned 1 threads for program 'GlusterFS 3.3'; total count:1
[2018-09-17 16:46:36.779478] I [rpcsvc.c:2052:rpcsvc_spawn_threads] 0-rpc-service: spawned 1 threads for program 'GlusterFS 4.x v1'; total count:1
[2018-09-17 16:46:36.791791] I [MSGID: 121050] [ctr-helper.c:258:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is disabled.
[2018-09-17 16:46:36.865149] I [trash.c:2526:init] 0-erasure-trash: no option specified for 'eliminate', using NULL
Final graph:
+------------------------------------------------------------------------------+
  1: volume erasure-posix
  2:     type storage/posix
  3:     option glusterd-uuid abe11c0a-ba2d-48e8-a989-67cbd31979dd
  4:     option directory /var/lib/erasure
  5:     option volume-id 3768bf54-07cf-43e0-9848-482863685c6d
  6:     option shared-brick-count 1
  7: end-volume
  8:  
  9: volume erasure-trash
 10:     type features/trash
 11:     option trash-dir .trashcan
 12:     option brick-path /var/lib/erasure
 13:     option trash-internal-op off
 14:     subvolumes erasure-posix
 15: end-volume
 16:  
 17: volume erasure-changetimerecorder
 18:     type features/changetimerecorder
 19:     option db-type sqlite3
 20:     option hot-brick off
 21:     option db-name erasure.db
 22:     option db-path /var/lib/erasure/.glusterfs/
 23:     option record-exit off
 24:     option ctr_link_consistency off
 25:     option ctr_lookupheal_link_timeout 300
 26:     option ctr_lookupheal_inode_timeout 300
 27:     option record-entry on
 28:     option ctr-enabled off
 29:     option record-counters off
 30:     option ctr-record-metadata-heat off
 31:     option sql-db-cachesize 12500
 32:     option sql-db-wal-autocheckpoint 25000
 33:     subvolumes erasure-trash
 34: end-volume
 35:  
 36: volume erasure-changelog
 37:     type features/changelog
 38:     option changelog-brick /var/lib/erasure
 39:     option changelog-dir /var/lib/erasure/.glusterfs/changelogs
 40:     option changelog-barrier-timeout 120
 41:     subvolumes erasure-changetimerecorder
 42: end-volume
 43:  
 44: volume erasure-bitrot-stub
 45:     type features/bitrot-stub
 46:     option export /var/lib/erasure
 47:     option bitrot disable
 48:     subvolumes erasure-changelog
 49: end-volume
 50:  
 51: volume erasure-access-control
 52:     type features/access-control
 53:     subvolumes erasure-bitrot-stub
 54: end-volume
 55:  
 56: volume erasure-locks
 57:     type features/locks
 58:     subvolumes erasure-access-control
 59: end-volume
 60:  
 61: volume erasure-worm
 62:     type features/worm
 63:     option worm off
 64:     option worm-file-level off
 65:     option worm-files-deletable on
 66:     subvolumes erasure-locks
 67: end-volume
 68:  
 69: volume erasure-read-only
 70:     type features/read-only
 71:     option read-only off
 72:     subvolumes erasure-worm
 73: end-volume
 74:  
 75: volume erasure-leases
 76:     type features/leases
 77:     option leases off
 78:     subvolumes erasure-read-only
 79: end-volume
 80:  
 81: volume erasure-upcall
 82:     type features/upcall
 83:     option cache-invalidation off
 84:     subvolumes erasure-leases
 85: end-volume
 86:  
 87: volume erasure-io-threads
 88:     type performance/io-threads
 89:     subvolumes erasure-upcall
 90: end-volume
 91:  
 92: volume erasure-selinux
 93:     type features/selinux
 94:     option selinux on
 95:     subvolumes erasure-io-threads
 96: end-volume
 97:  
 98: volume erasure-marker
 99:     type features/marker
100:     option volume-uuid 3768bf54-07cf-43e0-9848-482863685c6d
101:     option timestamp-file /var/lib/glusterd/vols/erasure/marker.tstamp
102:     option quota-version 0
103:     option xtime off
104:     option gsync-force-xtime off
105:     option quota off
106:     option inode-quota off
107:     subvolumes erasure-selinux
108: end-volume
109:  
110: volume erasure-barrier
111:     type features/barrier
112:     option barrier disable
113:     option barrier-timeout 120
114:     subvolumes erasure-marker
115: end-volume
116:  
117: volume erasure-index
118:     type features/index
119:     option index-base /var/lib/erasure/.glusterfs/indices
120:     option xattrop64-watchlist trusted.ec.dirty
121:     subvolumes erasure-barrier
122: end-volume
123:  
124: volume erasure-quota
125:     type features/quota
126:     option volume-uuid erasure
127:     option server-quota off
128:     option deem-statfs off
129:     subvolumes erasure-index
130: end-volume
131:  
132: volume erasure-io-stats
133:     type debug/io-stats
134:     option unique-id /var/lib/erasure
135:     option log-level INFO
136:     option latency-measurement off
137:     option count-fop-hits off
138:     subvolumes erasure-quota
139: end-volume
140:  
141: volume /var/lib/erasure
142:     type performance/decompounder
143:     option auth.addr./var/lib/erasure.allow *
144:     option auth-path /var/lib/erasure
145:     option auth.login.59cf78fb-7702-4e10-b3d5-a825ae802021.password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
146:     option auth.login./var/lib/erasure.allow 59cf78fb-7702-4e10-b3d5-a825ae802021
147:     subvolumes erasure-io-stats
148: end-volume
149:  
150: volume erasure-server
151:     type protocol/server
152:     option transport.socket.listen-port 49153
153:     option rpc-auth.auth-glusterfs on
154:     option rpc-auth.auth-unix on
155:     option rpc-auth.auth-null on
156:     option rpc-auth-allow-insecure on
157:     option transport-type tcp
158:     option transport.address-family inet
159:     option auth.login./var/lib/erasure.allow 59cf78fb-7702-4e10-b3d5-a825ae802021
160:     option auth.login.59cf78fb-7702-4e10-b3d5-a825ae802021.password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
161:     option auth-path /var/lib/erasure
162:     option auth.addr./var/lib/erasure.allow *
163:     option transport.socket.keepalive 1
164:     option transport.tcp-user-timeout 0
165:     option transport.socket.keepalive-time 20
166:     option transport.socket.keepalive-interval 2
167:     option transport.socket.keepalive-count 9
168:     option transport.listen-backlog 1024
169:     subvolumes /var/lib/erasure
170: end-volume
171:  
+------------------------------------------------------------------------------+
[2018-09-17 16:46:38.116226] I [addr.c:55:compare_addr_and_update] 0-/var/lib/erasure: allowed = "*", received addr = "10.1.33.66"
[2018-09-17 16:46:38.116407] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 59cf78fb-7702-4e10-b3d5-a825ae802021
[2018-09-17 16:46:38.116525] I [MSGID: 115029] [server-handshake.c:763:server_setvolume] 0-erasure-server: accepted client from CTX_ID:b69b560e-e392-4b13-a770-6ebe3c1cef3a-GRAPH_ID:0-PID:26554-HOST:lupine-PC_NAME:erasure-client-2-RECON_NO:-0 (version: 4.1.3)
[2018-09-17 16:46:39.441881] I [addr.c:55:compare_addr_and_update] 0-/var/lib/erasure: allowed = "*", received addr = "10.1.4.145"
[2018-09-17 16:46:39.442015] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 59cf78fb-7702-4e10-b3d5-a825ae802021
[2018-09-17 16:46:39.442123] I [MSGID: 115029] [server-handshake.c:763:server_setvolume] 0-erasure-server: accepted client from CTX_ID:54555d90-758b-4b58-9e8c-38f0722e823c-GRAPH_ID:0-PID:10554-HOST:chicard-PC_NAME:erasure-client-2-RECON_NO:-0 (version: 4.1.3)
[2018-09-17 16:46:44.397007] I [addr.c:55:compare_addr_and_update] 0-/var/lib/erasure: allowed = "*", received addr = "10.1.9.46"
[2018-09-17 16:46:44.397160] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 59cf78fb-7702-4e10-b3d5-a825ae802021
[2018-09-17 16:46:44.397276] I [MSGID: 115029] [server-handshake.c:763:server_setvolume] 0-erasure-server: accepted client from CTX_ID:3ad49b62-0782-4507-a2de-a358d0480211-GRAPH_ID:0-PID:9216-HOST:villequin-PC_NAME:erasure-client-2-RECON_NO:-0 (version: 4.1.3)
[2018-09-17 16:49:31.077319] I [addr.c:55:compare_addr_and_update] 0-/var/lib/erasure: allowed = "*", received addr = "10.1.33.66"
[2018-09-17 16:49:31.077555] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 59cf78fb-7702-4e10-b3d5-a825ae802021
[2018-09-17 16:49:31.077764] I [MSGID: 115029] [server-handshake.c:763:server_setvolume] 0-erasure-server: accepted client from CTX_ID:079c6f7c-1a1a-4062-a5d3-65ec013dbbc6-GRAPH_ID:0-PID:26675-HOST:lupine-PC_NAME:erasure-client-2-RECON_NO:-0 (version: 4.1.3)
[2018-09-17 16:49:31.109613] W [MSGID: 113117] [posix-metadata.c:569:posix_update_utime_in_mdata] 0-erasure-posix: posix utime set mdata failed on file [Function not implemented]
[2018-09-17 16:49:37.592182] W [MSGID: 113117] [posix-metadata.c:671:posix_set_parent_ctime] 0-erasure-posix: posix parent set mdata failed on file [No such file or directory]
[2018-09-17 16:50:14.119444] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-erasure-server: disconnecting connection from CTX_ID:079c6f7c-1a1a-4062-a5d3-65ec013dbbc6-GRAPH_ID:0-PID:26675-HOST:lupine-PC_NAME:erasure-client-2-RECON_NO:-0
[2018-09-17 16:50:14.119707] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-erasure-server: releasing lock on 00000000-0000-0000-0000-000000000001 held by {client=0xb2bf6a48, pid=26467 lk-owner=3437e0b1}
[2018-09-17 16:50:14.119837] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-erasure-server: releasing lock on 567fd882-54e9-4d47-8d8e-03b6402081d1 held by {client=0xb2bf6a48, pid=26467 lk-owner=2c2fe0b1}
[2018-09-17 16:50:14.119931] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-erasure-server: releasing lock on 761ed9be-7805-44a8-aaa5-3fa1e8e1ae9c held by {client=0xb2bf6a48, pid=26467 lk-owner=3cc1e0b1}
[2018-09-17 16:50:14.120079] I [MSGID: 115013] [server-helpers.c:286:do_fd_cleanup] 0-erasure-server: fd cleanup on /foo/bar
[2018-09-17 16:50:14.121498] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-erasure-server: Shutting down connection CTX_ID:079c6f7c-1a1a-4062-a5d3-65ec013dbbc6-GRAPH_ID:0-PID:26675-HOST:lupine-PC_NAME:erasure-client-2-RECON_NO:-0
[2018-09-17 16:49:37.610616] W [MSGID: 113117] [posix-metadata.c:569:posix_update_utime_in_mdata] 0-erasure-posix: posix utime set mdata failed on file [Function not implemented]
[2018-09-17 16:50:14.028850] W [MSGID: 113117] [posix-metadata.c:671:posix_set_parent_ctime] 0-erasure-posix: posix parent set mdata failed on file [No such file or directory]
[2018-09-17 16:51:25.719362] I [addr.c:55:compare_addr_and_update] 0-/var/lib/erasure: allowed = "*", received addr = "10.1.33.66"
[2018-09-17 16:51:25.719566] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 59cf78fb-7702-4e10-b3d5-a825ae802021
[2018-09-17 16:51:25.719704] I [MSGID: 115029] [server-handshake.c:763:server_setvolume] 0-erasure-server: accepted client from CTX_ID:cd5b03ca-f4b8-4aba-b9d9-4f662ed967c0-GRAPH_ID:0-PID:26785-HOST:lupine-PC_NAME:erasure-client-2-RECON_NO:-0 (version: 4.1.3)
[2018-09-17 16:51:34.990971] W [MSGID: 113117] [posix-metadata.c:671:posix_set_parent_ctime] 0-erasure-posix: posix parent set mdata failed on file [Invalid argument]
[2018-09-17 16:51:49.765470] W [MSGID: 113117] [posix-metadata.c:671:posix_set_parent_ctime] 0-erasure-posix: posix parent set mdata failed on file [No such file or directory]
[2018-09-17 16:51:49.795253] W [MSGID: 113117] [posix-metadata.c:569:posix_update_utime_in_mdata] 0-erasure-posix: posix utime set mdata failed on file
[2018-09-17 16:52:12.378955] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-erasure-server: disconnecting connection from CTX_ID:cd5b03ca-f4b8-4aba-b9d9-4f662ed967c0-GRAPH_ID:0-PID:26785-HOST:lupine-PC_NAME:erasure-client-2-RECON_NO:-0
[2018-09-17 16:52:12.379242] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-erasure-server: releasing lock on 00000000-0000-0000-0000-000000000001 held by {client=0xb2bf7498, pid=26467 lk-owner=3437e0b1}
[2018-09-17 16:52:12.379361] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-erasure-server: releasing lock on 5c41745b-0251-4c84-a735-daf473da04f8 held by {client=0xb2bf7498, pid=26467 lk-owner=8458e0b1}
[2018-09-17 16:52:12.379499] I [MSGID: 115013] [server-helpers.c:286:do_fd_cleanup] 0-erasure-server: fd cleanup on /hello
[2018-09-17 16:52:12.381878] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-erasure-server: Shutting down connection CTX_ID:cd5b03ca-f4b8-4aba-b9d9-4f662ed967c0-GRAPH_ID:0-PID:26785-HOST:lupine-PC_NAME:erasure-client-2-RECON_NO:-0
The message "W [MSGID: 113117] [posix-metadata.c:671:posix_set_parent_ctime] 0-erasure-posix: posix parent set mdata failed on file [Invalid argument]" repeated 2 times between [2018-09-17 16:51:34.990971] and [2018-09-17 16:52:04.485503]
[2018-09-17 16:52:12.285144] W [MSGID: 113117] [posix-metadata.c:671:posix_set_parent_ctime] 0-erasure-posix: posix parent set mdata failed on file [No such file or directory]
`/var/log/glusterfs/glusterd.log`:
[2018-09-17 16:46:03.780523] W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] 0-management: Failed to save the backtrace.
[2018-09-17 16:46:03.790084] E [MSGID: 106301] [glusterd-syncop.c:1352:gd_stage_op_phase] 0-management: Staging of operation 'Volume Create' failed on localhost : The brick 10.1.33.66:/var/lib/erasure is being created in the root partition. It is recommended that you don't use the system's root partition for storage backend. Or use 'force' at the end of the command if you want to override this behavior.
[2018-09-17 16:46:20.549786] W [MSGID: 101095] [xlator.c:181:xlator_volopt_dynload] 0-xlator: /usr/lib/arm-linux-gnueabihf/glusterfs/4.1.3/xlator/nfs/server.so: cannot open shared object file: No such file or directory
[2018-09-17 16:46:36.685142] I [glusterd-utils.c:6089:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/erasure
[2018-09-17 16:46:36.881833] I [MSGID: 106142] [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick /var/lib/erasure on port 49153
[2018-09-17 16:46:36.884874] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-09-17 16:46:36.977723] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2018-09-17 16:46:36.979039] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-gfproxyd: setting frame-timeout to 600
[2018-09-17 16:46:36.980462] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2018-09-17 16:46:36.980665] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped
[2018-09-17 16:46:36.980853] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed
[2018-09-17 16:46:36.994393] I [MSGID: 106568] [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 20564
[2018-09-17 16:46:37.994886] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: glustershd service is stopped
[2018-09-17 16:46:37.995299] I [MSGID: 106567] [glusterd-svc-mgmt.c:203:glusterd_svc_start] 0-management: Starting glustershd service
[2018-09-17 16:46:38.007328] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2018-09-17 16:46:38.007534] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped
[2018-09-17 16:46:38.008255] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2018-09-17 16:46:38.008432] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped
The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 18 times between [2018-09-17 16:46:03.780523] and [2018-09-17 16:46:44.259433]
[2018-09-17 16:47:10.628598] I [MSGID: 106488] [glusterd-handler.c:1549:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2018-09-17 16:48:05.073947] I [MSGID: 106499] [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: Received status volume req for volume erasure
[2018-09-17 16:48:05.074420] W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] 0-management: Failed to save the backtrace.
The message "I [MSGID: 106488] [glusterd-handler.c:1549:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 3 times between [2018-09-17 16:47:10.628598] and [2018-09-17 16:47:22.481012]
The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 7 times between [2018-09-17 16:48:05.074420] and [2018-09-17 16:48:05.094878]
[2018-09-17 16:48:51.440133] I [MSGID: 106488] [glusterd-handler.c:1549:__glusterd_handle_cli_get_volume] 0-management: Received get vol req

Let me know if you want me to try something or additional logs :)

@pranithk
Copy link
Member

Could you add attach the following log: /var/log/glusterfs/mnt-test.log when this error happened?

@superboum
Copy link

Logs for `/var/log/glusterfs/mnt-test.log`:
[2018-09-17 16:49:30.964082] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.3 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=127.0.0.1 --volfile-id=/erasure /mnt/test)
[2018-09-17 16:49:31.022690] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-17 16:49:31.049937] I [MSGID: 122069] [ec-code.c:1043:ec_code_detect] 0-erasure-disperse-0: Not using any cpu extensions
[2018-09-17 16:49:31.052800] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-17 16:49:31.057447] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-0: parent translators are ready, attempting connect on transport
[2018-09-17 16:49:31.059096] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-1: parent translators are ready, attempting connect on transport
[2018-09-17 16:49:31.060631] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-2: parent translators are ready, attempting connect on transport
[2018-09-17 16:49:31.061447] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:49:31.061940] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
  1: volume erasure-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host 10.1.9.46
  5:     option remote-subvolume /var/lib/erasure
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
  9:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14:     option send-gids true
 15: end-volume
 16:  
 17: volume erasure-client-1
 18:     type protocol/client
 19:     option ping-timeout 42
 20:     option remote-host 10.1.4.145
 21:     option remote-subvolume /var/lib/erasure
 22:     option transport-type socket
 23:     option transport.address-family inet
 24:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
 25:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 26:     option transport.tcp-user-timeout 0
[2018-09-17 16:49:31.063910] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
 27:     option transport.socket.keepalive-time 20
 28:     option transport.socket.keepalive-interval 2
 29:     option transport.socket.keepalive-count 9
 30:     option send-gids true
 31: end-volume
 32:  
 33: volume erasure-client-2
 34:     type protocol/client
 35:     option ping-timeout 42
 36:     option remote-host 10.1.33.66
 37:     option remote-subvolume /var/lib/erasure
 38:     option transport-type socket
 39:     option transport.address-family inet
 40:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
 41:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 42:     option transport.tcp-user-timeout 0
 43:     option transport.socket.keepalive-time 20
 44:     option transport.socket.keepalive-interval 2
 45:     option transport.socket.keepalive-count 9
 46:     option send-gids true
 47: end-volume
 48:  
 49: volume erasure-disperse-0
 50:     type cluster/disperse
 51:     option redundancy 1
[2018-09-17 16:49:31.065744] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
 52:     subvolumes erasure-client-0 erasure-client-1 erasure-client-2
 53: end-volume
 54:  
 55: volume erasure-dht
 56:     type cluster/distribute
 57:     option lock-migration off
 58:     option force-migration off
 59:     subvolumes erasure-disperse-0
 60: end-volume
 61:  
 62: volume erasure-write-behind
 63:     type performance/write-behind
 64:     subvolumes erasure-dht
 65: end-volume
 66:  
 67: volume erasure-read-ahead
 68:     type performance/read-ahead
 69:     subvolumes erasure-write-behind
[2018-09-17 16:49:31.066276] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:49:31.067173] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
 70: end-volume
 71:  
 72: volume erasure-readdir-ahead
 73:     type performance/readdir-ahead
 74:     option parallel-readdir off
 75:     option rda-request-size 131072
 76:     option rda-cache-limit 10MB
 77:     subvolumes erasure-read-ahead
 78: end-volume
 79:  
 80: volume erasure-io-cache
 81:     type performance/io-cache
 82:     subvolumes erasure-readdir-ahead
 83: end-volume
 84:  
 85: volume erasure-quick-read
 86:     type performance/quick-read
 87:     subvolumes erasure-io-cache
 88: end-volume
 89:  
[2018-09-17 16:49:31.067976] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-2: changing port to 49153 (from 0)
 90: volume erasure-open-behind
 91:     type performance/open-behind
 92:     subvolumes erasure-quick-read
 93: end-volume
 94:  
 95: volume erasure-md-cache
 96:     type performance/md-cache
 97:     subvolumes erasure-open-behind
 98: end-volume
 99:  
100: volume erasure-io-threads
101:     type performance/io-threads
102:     subvolumes erasure-md-cache
[2018-09-17 16:49:31.069246] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-1: changing port to 49153 (from 0)
103: end-volume
104:  
105: volume erasure
106:     type debug/io-stats
107:     option log-level INFO
108:     option latency-measurement off
109:     option count-fop-hits off
110:     subvolumes erasure-io-threads
111: end-volume
112:  
113: volume meta-autoload
114:     type meta
115:     subvolumes erasure
116: end-volume
117:  
+------------------------------------------------------------------------------+
[2018-09-17 16:49:31.071114] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-0: changing port to 49153 (from 0)
[2018-09-17 16:49:31.073520] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:49:31.074185] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:49:31.075697] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:49:31.076607] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:49:31.077510] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:49:31.078691] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:49:31.080292] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-2: Connected to erasure-client-2, attached to remote volume '/var/lib/erasure'.
[2018-09-17 16:49:31.082553] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-1: Connected to erasure-client-1, attached to remote volume '/var/lib/erasure'.
[2018-09-17 16:49:31.083175] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-0: Connected to erasure-client-0, attached to remote volume '/var/lib/erasure'.
[2018-09-17 16:49:31.083382] I [MSGID: 122062] [ec.c:347:ec_up] 0-erasure-disperse-0: Going UP
[2018-09-17 16:49:31.090671] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2018-09-17 16:49:31.090792] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-17 16:49:31.100498] I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-erasure-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(CREATE)
frame : type(1) op(STAT)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash: 
2018-09-17 16:50:14
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.3
---------
[2018-09-17 16:51:25.616694] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.3 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=127.0.0.1 --volfile-id=/erasure /mnt/test)
[2018-09-17 16:51:25.663077] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-17 16:51:25.688562] I [MSGID: 122069] [ec-code.c:1043:ec_code_detect] 0-erasure-disperse-0: Not using any cpu extensions
[2018-09-17 16:51:25.691916] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-17 16:51:25.696216] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-0: parent translators are ready, attempting connect on transport
[2018-09-17 16:51:25.697899] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-1: parent translators are ready, attempting connect on transport
[2018-09-17 16:51:25.699485] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-2: parent translators are ready, attempting connect on transport
[2018-09-17 16:51:25.699664] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.701135] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
  1: volume erasure-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host 10.1.9.46
  5:     option remote-subvolume /var/lib/erasure
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
  9:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
[2018-09-17 16:51:25.702142] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14:     option send-gids true
 15: end-volume
 16:  
 17: volume erasure-client-1
 18:     type protocol/client
 19:     option ping-timeout 42
 20:     option remote-host 10.1.4.145
 21:     option remote-subvolume /var/lib/erasure
 22:     option transport-type socket
 23:     option transport.address-family inet
 24:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
 25:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 26:     option transport.tcp-user-timeout 0
 27:     option transport.socket.keepalive-time 20
 28:     option transport.socket.keepalive-interval 2
 29:     option transport.socket.keepalive-count 9
 30:     option send-gids true
 31: end-volume
 32:  
 33: volume erasure-client-2
 34:     type protocol/client
 35:     option ping-timeout 42
 36:     option remote-host 10.1.33.66
 37:     option remote-subvolume /var/lib/erasure
 38:     option transport-type socket
 39:     option transport.address-family inet
 40:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
 41:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 42:     option transport.tcp-user-timeout 0
 43:     option transport.socket.keepalive-time 20
 44:     option transport.socket.keepalive-interval 2
 45:     option transport.socket.keepalive-count 9
 46:     option send-gids true
 47: end-volume
 48:  
 49: volume erasure-disperse-0
 50:     type cluster/disperse
 51:     option redundancy 1
 52:     subvolumes erasure-client-0 erasure-client-1 erasure-client-2
 53: end-volume
 54:  
 55: volume erasure-dht
 56:     type cluster/distribute
 57:     option lock-migration off
 58:     option force-migration off
 59:     subvolumes erasure-disperse-0
 60: end-volume
 61:  
 62: volume erasure-write-behind
 63:     type performance/write-behind
 64:     subvolumes erasure-dht
 65: end-volume
 66:  
[2018-09-17 16:51:25.702805] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
 67: volume erasure-read-ahead
 68:     type performance/read-ahead
 69:     subvolumes erasure-write-behind
 70: end-volume
 71:  
 72: volume erasure-readdir-ahead
 73:     type performance/readdir-ahead
 74:     option parallel-readdir off
 75:     option rda-request-size 131072
 76:     option rda-cache-limit 10MB
 77:     subvolumes erasure-read-ahead
 78: end-volume
 79:  
 80: volume erasure-io-cache
 81:     type performance/io-cache
 82:     subvolumes erasure-readdir-ahead
 83: end-volume
 84:  
 85: volume erasure-quick-read
 86:     type performance/quick-read
[2018-09-17 16:51:25.705933] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.707342] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.707711] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-0: changing port to 49153 (from 0)
 87:     subvolumes erasure-io-cache
 88: end-volume
 89:  
 90: volume erasure-open-behind
 91:     type performance/open-behind
 92:     subvolumes erasure-quick-read
 93: end-volume
 94:  
 95: volume erasure-md-cache
 96:     type performance/md-cache
 97:     subvolumes erasure-open-behind
 98: end-volume
 99:  
100: volume erasure-io-threads
101:     type performance/io-threads
102:     subvolumes erasure-md-cache
103: end-volume
104:  
105: volume erasure
106:     type debug/io-stats
107:     option log-level INFO
108:     option latency-measurement off
109:     option count-fop-hits off
110:     subvolumes erasure-io-threads
111: end-volume
112:  
113: volume meta-autoload
114:     type meta
115:     subvolumes erasure
116: end-volume
[2018-09-17 16:51:25.708113] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-2: changing port to 49153 (from 0)
117:  
+------------------------------------------------------------------------------+
[2018-09-17 16:51:25.712564] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-1: changing port to 49153 (from 0)
[2018-09-17 16:51:25.715429] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.715831] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.716392] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.718055] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.718742] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.719929] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 16:51:25.722276] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-2: Connected to erasure-client-2, attached to remote volume '/var/lib/erasure'.
[2018-09-17 16:51:25.722792] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-1: Connected to erasure-client-1, attached to remote volume '/var/lib/erasure'.
[2018-09-17 16:51:25.724128] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-0: Connected to erasure-client-0, attached to remote volume '/var/lib/erasure'.
[2018-09-17 16:51:25.724403] I [MSGID: 122062] [ec.c:347:ec_up] 0-erasure-disperse-0: Going UP
[2018-09-17 16:51:25.733409] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2018-09-17 16:51:25.733549] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-17 16:51:25.742978] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-erasure-dht: Directory selfheal failed: Unable to form layout for directory /
pending frames:
frame : type(1) op(FLUSH)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(CREATE)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash: 
2018-09-17 16:52:12
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.3
---------
[2018-09-17 17:04:18.598182] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.3 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=127.0.0.1 --volfile-id=/erasure /mnt/test)
[2018-09-17 17:04:18.649228] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-17 17:04:18.674303] I [MSGID: 122069] [ec-code.c:1043:ec_code_detect] 0-erasure-disperse-0: Not using any cpu extensions
[2018-09-17 17:04:18.677508] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-17 17:04:18.681493] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-0: parent translators are ready, attempting connect on transport
[2018-09-17 17:04:18.683113] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-1: parent translators are ready, attempting connect on transport
[2018-09-17 17:04:18.684778] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.685321] I [MSGID: 114020] [client.c:2328:notify] 0-erasure-client-2: parent translators are ready, attempting connect on transport
[2018-09-17 17:04:18.687022] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
  1: volume erasure-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host 10.1.9.46
[2018-09-17 17:04:18.687551] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.688032] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
  5:     option remote-subvolume /var/lib/erasure
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
  9:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14:     option send-gids true
 15: end-volume
 16:  
 17: volume erasure-client-1
 18:     type protocol/client
 19:     option ping-timeout 42
 20:     option remote-host 10.1.4.145
 21:     option remote-subvolume /var/lib/erasure
 22:     option transport-type socket
[2018-09-17 17:04:18.689429] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
 23:     option transport.address-family inet
 24:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
 25:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 26:     option transport.tcp-user-timeout 0
 27:     option transport.socket.keepalive-time 20
 28:     option transport.socket.keepalive-interval 2
 29:     option transport.socket.keepalive-count 9
 30:     option send-gids true
 31: end-volume
 32:  
 33: volume erasure-client-2
 34:     type protocol/client
 35:     option ping-timeout 42
 36:     option remote-host 10.1.33.66
 37:     option remote-subvolume /var/lib/erasure
 38:     option transport-type socket
 39:     option transport.address-family inet
 40:     option username 59cf78fb-7702-4e10-b3d5-a825ae802021
 41:     option password daae0da3-0d1c-4b5c-aabd-b0cda43b1efb
 42:     option transport.tcp-user-timeout 0
 43:     option transport.socket.keepalive-time 20
 44:     option transport.socket.keepalive-interval 2
 45:     option transport.socket.keepalive-count 9
 46:     option send-gids true
 47: end-volume
 48:  
 49: volume erasure-disperse-0
 50:     type cluster/disperse
 51:     option redundancy 1
 52:     subvolumes erasure-client-0 erasure-client-1 erasure-client-2
 53: end-volume
 54:  
 55: volume erasure-dht
 56:     type cluster/distribute
 57:     option lock-migration off
 58:     option force-migration off
 59:     subvolumes erasure-disperse-0
 60: end-volume
 61:  
 62: volume erasure-write-behind
 63:     type performance/write-behind
 64:     subvolumes erasure-dht
 65: end-volume
 66:  
 67: volume erasure-read-ahead
 68:     type performance/read-ahead
 69:     subvolumes erasure-write-behind
 70: end-volume
 71:  
 72: volume erasure-readdir-ahead
 73:     type performance/readdir-ahead
 74:     option parallel-readdir off
 75:     option rda-request-size 131072
 76:     option rda-cache-limit 10MB
 77:     subvolumes erasure-read-ahead
 78: end-volume
 79:  
 80: volume erasure-io-cache
 81:     type performance/io-cache
 82:     subvolumes erasure-readdir-ahead
[2018-09-17 17:04:18.690325] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.692823] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-0: changing port to 49153 (from 0)
 83: end-volume
 84:  
 85: volume erasure-quick-read
 86:     type performance/quick-read
 87:     subvolumes erasure-io-cache
 88: end-volume
 89:  
 90: volume erasure-open-behind
 91:     type performance/open-behind
 92:     subvolumes erasure-quick-read
 93: end-volume
 94:  
 95: volume erasure-md-cache
 96:     type performance/md-cache
 97:     subvolumes erasure-open-behind
 98: end-volume
 99:  
100: volume erasure-io-threads
101:     type performance/io-threads
102:     subvolumes erasure-md-cache
103: end-volume
104:  
105: volume erasure
106:     type debug/io-stats
107:     option log-level INFO
108:     option latency-measurement off
109:     option count-fop-hits off
110:     subvolumes erasure-io-threads
111: end-volume
112:  
113: volume meta-autoload
114:     type meta
115:     subvolumes erasure
116: end-volume
117:  
+------------------------------------------------------------------------------+
[2018-09-17 17:04:18.693167] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-2: changing port to 49153 (from 0)
[2018-09-17 17:04:18.695438] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-erasure-client-1: changing port to 49153 (from 0)
[2018-09-17 17:04:18.698365] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.699621] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.700987] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.701583] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-2: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.701815] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.703558] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-erasure-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-17 17:04:18.706187] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-2: Connected to erasure-client-2, attached to remote volume '/var/lib/erasure'.
[2018-09-17 17:04:18.708550] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-0: Connected to erasure-client-0, attached to remote volume '/var/lib/erasure'.
[2018-09-17 17:04:18.709023] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-erasure-client-1: Connected to erasure-client-1, attached to remote volume '/var/lib/erasure'.
[2018-09-17 17:04:18.709162] I [MSGID: 122062] [ec.c:347:ec_up] 0-erasure-disperse-0: Going UP
[2018-09-17 17:04:18.717759] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2018-09-17 17:04:18.717885] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-17 17:04:18.728232] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-erasure-dht: Directory selfheal failed: Unable to form layout for directory /
pending frames:
frame : type(1) op(FLUSH)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(STAT)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash: 
2018-09-17 17:04:25
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.3
---------

@pranithk
Copy link
Member

@superboum It seems to be crashing :-(. Could you check the corefile and attach the backtrace of the core?

@superboum
Copy link

superboum commented Sep 18, 2018

Sorry, I am not very familiar with C programs debugging but indeed I have a /core file on my system.
I don't know if I correctly extracted the backtrace, let me know:

Backtrace
$  ls -lah /core
-rw------- 1 root root 97M Sep 17 17:04 /core
$ gdb -c /core /usr/sbin/glusterfs
GNU gdb (Debian 8.1-4) 8.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/glusterfs...Reading symbols from /usr/lib/debug/.build-id/8e/ae0dd2cfbca6c60f86764774317ddbb9cc3850.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 27052]
[New LWP 27050]
[New LWP 27042]
[New LWP 27055]
[New LWP 27049]
[New LWP 27048]
[New LWP 27047]
[New LWP 27046]
[New LWP 27045]
[New LWP 27054]
[New LWP 27051]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=127.0.0.1 --volfile-id'.
Program terminated with signal SIGABRT, Aborted.
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
47	../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory.
[Current thread is 1 (Thread 0xb1dff420 (LWP 27052))]
(gdb) bt full
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
No locals.
#1  0xb6b3ab3e in __libc_signal_restore_set (set=0xb1dfde6c) at ../sysdeps/unix/sysv/linux/nptl-signals.h:80
        _a2tmp = -1310728596
        _a2 = -1310728596
        _nametmp = 175
        _a3tmp = 0
        _a3 = 0
        _a1 = 0
        _a4tmp = 8
        _a1tmp = 2
        _a4 = 8
        _name = 175
#2  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:48
        set = {__val = {1073478167, 4294967294, 0 <repeats 27 times>, 32, 3065228477, 0}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#3  0xb6b3b82e in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x1, sa_sigaction = 0x1}, sa_mask = {__val = {2973780648, 2, 0, 2997200112, 0, 
              0, 2996922975, 9, 0, 0, 2984276748, 0, 0, 2997200372, 2997200112, 3069785048, 2973772452, 0, 2, 0, 1, 2973788980, 
              2973808204, 4294967294, 0, 0, 1, 2984239320, 2984239352, 2984239384, 2973780648, 3070167400}}, 
          sa_flags = -1310727856, sa_restorer = 0xb6bac107 <__GI___mmap+22>}
        sigs = {__val = {32, 0 <repeats 31 times>}}
#4  0xb6b63484 in __libc_message (action=do_abort, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:181
        ap = {__ap = 0xb1dfe190}
        fd = 2
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
#5  0xb6bba7f4 in __GI___fortify_fail_abort (need_backtrace=need_backtrace@entry=false, msg=0xb6bf090c "stack smashing detected")
    at fortify_fail.c:33
No locals.
#6  0xb6bba7d0 in __stack_chk_fail () at stack_chk_fail.c:29
No locals.
#7  0xb2a350a2 in ec_manager_writev (fop=<optimized out>, state=<optimized out>) at ec-inode-write.c:2238
        cbk = <optimized out>
        ctx = <optimized out>
        ec = <optimized out>
        fl_start = <optimized out>
        fl_size = 512
        __FUNCTION__ = "ec_manager_writev"
#8  0xb2a182e8 in __ec_manager (fop=0xb1402aa4, error=<optimized out>) at ec-common.c:2867
        ec = 0xb2b53480
        __FUNCTION__ = "__ec_manager"
#9  0xb2a13c7a in ec_gf_writev (frame=<optimized out>, this=<optimized out>, fd=<optimized out>, vector=<optimized out>, count=1, 

    offset=0, flags=131073, iobref=0xb0a03890, xdata=0x0) at ec.c:1266
No locals.
#10 0xb29ce280 in dht_writev (frame=<optimized out>, this=<optimized out>, fd=0xb0a014e4, vector=<optimized out>, count=1, off=0, 
    flags=131073, iobref=0xb0a03890, xdata=0x0) at dht-inode-write.c:234
        _new = <optimized out>
        old_THIS = <optimized out>
        next_xl_fn = 0xb2a13c41 <ec_gf_writev>
        tmp_cbk = <optimized out>
        subvol = 0x0
        op_errno = -1
        local = 0xb1430104
        __FUNCTION__ = "dht_writev"
#11 0xb2c31c28 in wb_fulfill_head (wb_inode=wb_inode@entry=0xb1e0a2b0, head=head@entry=0xb1431610) at write-behind.c:1111
        _new = <optimized out>
        old_THIS = <optimized out>
        next_xl_fn = 0xb29cdf71 <dht_writev>
        tmp_cbk = <optimized out>
        vector = {{iov_base = 0xb6696000, iov_len = 6}, {iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, iov_len = 0}, {
            iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, 
            iov_len = 0}, {iov_base = 0x0, iov_len = 0}}
        count = 1
        req = <optimized out>
        frame = <optimized out>
        __FUNCTION__ = "wb_fulfill_head"
#12 0xb2c31df0 in wb_fulfill (wb_inode=wb_inode@entry=0xb1e0a2b0, liabilities=liabilities@entry=0xb1dfe404) at write-behind.c:1192
        req = 0xb1dfe3ec
        head = <optimized out>
        tmp = 0xb1dfe3ec
        conf = 0xb2b4d7b8
        expected_offset = <optimized out>
        curr_aggregate = <optimized out>
        vector_count = <optimized out>
        ret = 0
#13 0xb2c32e02 in wb_process_queue (wb_inode=wb_inode@entry=0xb1e0a2b0) at write-behind.c:1720
        tasks = {next = 0xb1dfe3f4, prev = 0xb1dfe3f4}
        lies = {next = 0xb1dfe3fc, prev = 0xb1dfe3fc}
        liabilities = {next = 0xb1dfe404, prev = 0xb1dfe404}
        wind_failure = <optimized out>
        __FUNCTION__ = "wb_process_queue"

#14 0xb2c3345c in wb_writev (frame=0xb140b64c, this=<optimized out>, fd=<optimized out>, vector=0x0, count=1, offset=0, 
    flags=131073, iobref=0xb0a03890, xdata=0x0) at write-behind.c:1827
        wb_inode = 0xb1e0a2b0
        conf = 0xb2b4d7b8
        stub = 0xb142d0f4
        ret = <optimized out>
        op_errno = 22
        o_direct = <optimized out>
        __FUNCTION__ = "wb_writev"
#15 0xb2c140a8 in ra_writev (frame=<optimized out>, this=<optimized out>, fd=0xb0a014e4, vector=0xb0a03d48, count=1, offset=0, 
    flags=131073, iobref=0xb0a03890, xdata=0x0) at read-ahead.c:684
        _new = <optimized out>
        old_THIS = 0xb2b1d988
        next_xl_fn = 0xb2c3322d <wb_writev>
        tmp_cbk = <optimized out>
        file = <optimized out>
        tmp_file = 18446744072388520096
        __FUNCTION__ = "ra_writev"
#16 0xb6f64dfe in default_writev (frame=0xb1406b34, this=<optimized out>, fd=0xb0a014e4, vector=0xb0a03d48, count=1, off=0, 
    flags=131073, iobref=0xb0a03890, xdata=0x0) at defaults.c:2685
        old_THIS = 0xb2b20518
        next_xl = <optimized out>
        next_xl_fn = 0xb2c13b85 <ra_writev>
        __FUNCTION__ = "default_writev"
#17 0xb293a304 in ioc_writev (frame=<optimized out>, this=0x0, fd=0xb0a014e4, vector=0xb0a03d48, count=1, offset=0, flags=131073, 
    iobref=0xb0a03890, xdata=0x0) at io-cache.c:1267
        _new = 0xb1406b34
        old_THIS = 0xb2b23550
        next_xl_fn = 0xb6f64d45 <default_writev>
        tmp_cbk = <optimized out>
        local = <optimized out>
        ioc_inode = 18446744072388525792
        __FUNCTION__ = "ioc_writev"
#18 0xb291ded2 in qr_writev (frame=<optimized out>, this=<optimized out>, fd=0xb0a014e4, iov=0xb0a03d48, count=1, offset=0, 
    flags=131073, iobref=0xb0a03890, xdata=0x0) at quick-read.c:666
        _new = <optimized out>
        old_THIS = 0xb2b261b0
        next_xl_fn = 0xb2939e59 <ioc_writev>
        tmp_cbk = <optimized out>
        __FUNCTION__ = "qr_writev"
#19 0xb6f86494 in default_writev_resume (frame=<optimized out>, this=<optimized out>, fd=0xb0a014e4, vector=0xb0a03d48, count=1, 
    off=0, flags=131073, iobref=0xb0a03890, xdata=0x0) at defaults.c:1949
        _new = <optimized out>
        old_THIS = 0xb2b28dd0
        next_xl_fn = 0xb291dc35 <qr_writev>
        tmp_cbk = <optimized out>
        __FUNCTION__ = "default_writev_resume"
#20 0xb6f1a652 in call_resume_wind (stub=0xb1e0806c) at call-stub.c:2206
        __FUNCTION__ = "call_resume_wind"
#21 0xb6f1a9f4 in call_resume (stub=0xb1e0806c) at call-stub.c:2689
        old_THIS = 0xb2b28dd0
        __FUNCTION__ = "call_resume"
#22 0xb2904d58 in ob_wake_cbk (frame=0xb1e00fd4, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, 
    fd_ret=0xb0a014e4, xdata=0x0) at open-behind.c:173
        fd = 0xb0a014e4
        list = {next = 0xb1dfe724, prev = 0xb1dfe724}
        ob_fd = 0xb1e04ab8
        stub = <optimized out>
        tmp = 0xb1dfe724
#23 0xb6f6a994 in default_open_cbk (frame=0xb1e0256c, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, 
    fd=0xb0a014e4, xdata=0x0) at defaults.c:1198
        fn = 0xb2904cb5 <ob_wake_cbk>
        _parent = 0xb1e00fd4
        old_THIS = 0xb2b261b0
        __FUNCTION__ = "default_open_cbk"
#24 0xb2938d42 in ioc_open_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, 
    op_errno=<optimized out>, fd=0xb0a014e4, xdata=0x0) at io-cache.c:609
        fn = 0xb6f6a8a5 <default_open_cbk>
        _parent = 0xb1e0256c
        old_THIS = 0xb2b23550
        tmp_ioc_inode = 18446744072388525792
        local = <optimized out>
        table = <optimized out>
        ioc_inode = <optimized out>
        __FUNCTION__ = "ioc_open_cbk"
#25 0xb2c0f2ba in ra_open_cbk (frame=0xb1e02654, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, 
    op_errno=<optimized out>, fd=0xb0a014e4, xdata=0x0) at read-ahead.c:101
        fn = 0xb2938a25 <ioc_open_cbk>
        _parent = 0xb1e011a4
        old_THIS = 0xb2b1d988
        conf = <optimized out>
        file = <optimized out>
        ret = <optimized out>
        __FUNCTION__ = "ra_open_cbk"
#26 0xb29d23e8 in dht_open_cbk (frame=0xb1e0122c, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, 
    op_errno=0, fd=0xb0a014e4, xdata=0x0) at dht-inode-read.c:66
        fn = 0xb2c0f0ed <ra_open_cbk>
        _parent = 0xb1e02654
        old_THIS = 0xb2b17de8
        __local = 0xb1e012b4
        __xl = 0xb2b17de8
        local = 0xb1e012b4
        prev = <optimized out>
        ret = <optimized out>
        __FUNCTION__ = "dht_open_cbk"
#27 0xb6f6a994 in default_open_cbk (frame=0xb1e00e2c, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, 
    fd=0xb0a014e4, xdata=0x0) at defaults.c:1198
        fn = 0xb29d22dd <dht_open_cbk>
        _parent = 0xb1e0122c
        old_THIS = 0xb2b14ec8
        __FUNCTION__ = "default_open_cbk"
#28 0xb2a2c7de in ec_manager_open (fop=<optimized out>, state=<optimized out>) at ec-inode-read.c:882
        cbk = 0xb142f0fc
        ctx = <optimized out>
        err = <optimized out>
        __FUNCTION__ = "ec_manager_open"
#29 0xb2a182e8 in __ec_manager (fop=0xb1e0c3dc, error=<optimized out>) at ec-common.c:2867
        ec = 0xb2b53480
        __FUNCTION__ = "__ec_manager"
#30 0xb2a18438 in ec_resume (fop=fop@entry=0xb1e0c3dc, error=error@entry=0) at ec-common.c:492
        resume = 0xb2a18271 <__ec_manager>
#31 0xb2a18558 in ec_complete (fop=fop@entry=0xb1e0c3dc) at ec-common.c:565
        cbk = 0xb142f0fc
        resume = 1
        update = 1
        healing_count = <optimized out>
#32 0xb2a2c4a6 in ec_open_cbk (frame=<optimized out>, cookie=0x1, this=0xb2b14ec8, op_ret=0, op_errno=0, fd=0xb0a014e4, xdata=0x0)
    at ec-inode-read.c:760
        fop = 0xb1e0c3dc
        cbk = <optimized out>
        idx = 1
        __FUNCTION__ = "ec_open_cbk"
#33 0xb2adb322 in client4_0_open_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=<optimized out>)
    at client-rpc-fops_v2.c:275
        fn = 0xb2a2c401 <ec_open_cbk>
        _parent = 0xb1e04324
        old_THIS = 0xb2b0d2f0
        __local = 0xb1e0e964
        local = <optimized out>
        frame = <optimized out>
        fd = 0xb0a014e4
        ret = <optimized out>
        rsp = {op_ret = 0, op_errno = 0, xdata = {xdr_size = 0, count = -1, pairs = {pairs_len = 0, pairs_val = 0x0}}, fd = 0}
        this = <optimized out>
        xdata = 0x0
        __FUNCTION__ = "client4_0_open_cbk"
#34 0xb6ec7b8e in rpc_clnt_handle_reply (clnt=clnt@entry=0xb2b75e48, pollin=pollin@entry=0x0) at rpc-clnt.c:776
        conn = 0xb2b75e68
        saved_frame = 0xb1e02ea4
        ret = 0
        req = 0xb1e0702c
        xid = 24
        __FUNCTION__ = "rpc_clnt_handle_reply"
#35 0xb6ec7dd2 in rpc_clnt_notify (trans=0xb2b760e8, mydata=0xb2b75e68, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xb1427758)
    at rpc-clnt.c:984
        conn = 0xb2b75e68
        clnt = 0xb2b75e48
        ret = -1
        req_info = 0x0
        pollin = 0x0
        clnt_mydata = 0x0
        old_THIS = 0xb2b0d2f0
        __FUNCTION__ = "rpc_clnt_notify"
#36 0xb6ec4ef4 in rpc_transport_notify (this=this@entry=0xb2b760e8, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=0xb1427758)
    at rpc-transport.c:537
        ret = -1
        __FUNCTION__ = "rpc_transport_notify"
#37 0xb34bf284 in socket_event_poll_in (this=this@entry=0xb2b760e8, notify_handled=notify_handled@entry=true) at socket.c:2462
        ret = <optimized out>
        pollin = 0xb1427758
        priv = 0xb2b76660
        ctx = 0x449150
#38 0xb34c0e76 in socket_event_handler (fd=12, idx=3, gen=4, data=0xb2b760e8, poll_in=1, poll_out=0, poll_err=0) at socket.c:2618
        this = 0xb2b760e8
        priv = 0xb2b76660
        ret = 0
        ctx = 0x449150
        socket_closed = false
        notify_handled = false
        __FUNCTION__ = "socket_event_handler"
#39 0xb6f47f44 in event_dispatch_epoll_handler (event=0xb1dfedf0, event_pool=0x479df8) at event-epoll.c:587
        handler = 0xb34c0d39 <socket_event_handler>
        gen = 4
        slot = 0x4a67b0
        data = 0xb2b760e8
        ret = -1
        fd = <optimized out>
        ev_data = 0xb1dfedf8
        idx = 3
        handled_error_previously = <optimized out>
        ev_data = <optimized out>
        slot = <optimized out>
        handler = <optimized out>
        data = <optimized out>
        idx = <optimized out>
        gen = <optimized out>
        ret = <optimized out>
        fd = <optimized out>
        handled_error_previously = <optimized out>
        __FUNCTION__ = "event_dispatch_epoll_handler"
#40 event_dispatch_epoll_worker (data=0xb2b72ed8) at event-epoll.c:663
        event = {events = 1, data = {ptr = 0x3, fd = 3, u32 = 3, u64 = 17179869187}}
        ret = <optimized out>
        ev_data = 0xb2b72ed8
        event_pool = 0x479df8
        myindex = <optimized out>
        timetodie = 0
        __FUNCTION__ = "event_dispatch_epoll_worker"
#41 0xb6d98614 in start_thread (arg=0xa2ef09f6) at pthread_create.c:463
        pd = 0xa2ef09f6
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-1511431629, -1561392650, 344, -1310723040, -1287338174, -1310724560, 
                -1310723040, -1310722928, 0 <repeats 56 times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#42 0xb6bae90c in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
No locals.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

The full core file can be found here: https://mybox.inria.fr/f/006a11e610/ glusterfs-core-2018-09-17.zip

@pranithk
Copy link
Member

Thanks for the core, I am not able to download without email/password, is there any other location where I can download this file from?

@superboum
Copy link

superboum commented Sep 18, 2018

Sorry, you can find it here: http://miroir.deuxfleurs.fr:81/glusterfs-core-2018-09-17 glusterfs-core-2018-09-17.zip
(I didn't know this particular Seafile instance doesn't allow public shares...)

@xhernandez
Copy link
Contributor

xhernandez commented Sep 21, 2018

I can't download the core dump (connection timed out), but I think the problem is caused by conflicting sizes of data types on ARM.

If I write a patch, can you compile it an test ?

@superboum
Copy link

superboum commented Sep 24, 2018

Ok, I should have started by uploading it on github... Didn't know this feature existed:
glusterfs-core-2018-09-17.zip
For the previous link, I redeployed my machine and forgot to enable port 81 on iptables, sorry again

I can compile it and test it indeed.
Thanks for your time.

@DrEVILish
Copy link

I am having the same issue,
Dispersed Volume
4x Odroid HC2
Locally accessing the Gluster on each peer.

Ubuntu
Linux g01 4.14.5-92 #1 SMP PREEMPT Mon Dec 11 15:48:15 UTC 2017 armv7l armv7l armv7l GNU/Linux

Number of Bricks: 1 x (3 + 1) = 4
Transport-type: tcp
Everything else is default as per standard creation.

ls: cannot open directory '.': Transport endpoint is not connected.

@xhernandez
Copy link
Contributor

I've uploaded a patch that should fix the problem. I've fixed all warnings that appear when compiling on x86 32 bits (some of them were dangerous). If you get any warning related to variable sizes when you compile on ARM, please let me know.

Note that the patch is completely untested.

@superboum
Copy link

superboum commented Sep 26, 2018

I tried to compile GlusterFS with your patch and ./configure --enable-debug --with-ipv6-default (the second option has been added due to this bug). You can find the make output here:

Make output
make -j5
make --no-print-directory all-recursive
Making all in rpc/xdr/gen
make --no-print-directory all-am
make[3]: Nothing to be done for 'all-am'.
Making all in libglusterfs
Making all in src
make --no-print-directory all-am
make[4]: Nothing to be done for 'all-am'.
Making all in src/gfdb
make[3]: Nothing to be done for 'all'.
make[3]: Nothing to be done for 'all-am'.
Making all in rpc
Making all in xdr
Making all in src
make[4]: Nothing to be done for 'all'.
make[4]: Nothing to be done for 'all-am'.
Making all in rpc-lib
Making all in src
  CC       rpcsvc.lo
  CC       rpc-clnt.lo
  CC       rpc-drc.lo
  CC       xdr_sizeof.lo
  CC       rpc-clnt-ping.lo
  CC       autoscale-threads.lo
  CC       mgmt-pmap.lo
  CCLD     libgfrpc.la
make[4]: Nothing to be done for 'all-am'.
Making all in rpc-transport
Making all in socket
Making all in src
  CC       socket.lo
  CC       name.lo
  CCLD     socket.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
make[3]: Nothing to be done for 'all-am'.
Making all in api
Making all in src
  CC       libgfapi_la-glfs.lo
  CC       libgfapi_la-glfs-mgmt.lo
  CC       libgfapi_la-glfs-fops.lo
  CC       libgfapi_la-glfs-handleops.lo
  CC       libgfapi_la-glfs-resolve.lo
  CC       api_la-glfs-master.lo
  CCLD     libgfapi.la
  CCLD     api.la
Making all in examples
make[3]: Nothing to be done for 'all'.
make[3]: Nothing to be done for 'all-am'.
Making all in xlators
Making all in cluster
Making all in stripe
Making all in src
  CC       stripe.lo
  CC       stripe-helpers.lo
  CC       libxlator.lo
  CCLD     stripe.la
make[5]: Nothing to be done for 'all-am'.
Making all in afr
Making all in src
  CC       afr-dir-read.lo
  CC       afr-dir-write.lo
  CC       afr-inode-read.lo
  CC       afr-inode-write.lo
  CC       afr-open.lo
  CC       afr-transaction.lo
  CC       afr-lk-common.lo
  CC       afr-read-txn.lo
  CC       libxlator.lo
  CC       afr-self-heal-common.lo
  CC       afr-self-heal-data.lo
  CC       afr-self-heal-entry.lo
  CC       afr-self-heal-metadata.lo
  CC       afr-self-heald.lo
  CC       afr-self-heal-name.lo
  CC       afr.lo
  CCLD     afr.la
make[5]: Nothing to be done for 'all-am'.
Making all in dht
Making all in src
  CC       dht-layout.lo
  CC       dht-helper.lo
  CC       dht-linkfile.lo
  CC       dht-rebalance.lo
  CC       dht-selfheal.lo
  CC       dht-rename.lo
  CC       dht-hashfn.lo
  CC       dht-diskusage.lo
  CC       dht-common.lo
  CC       dht-inode-write.lo
  CC       dht-inode-read.lo
  CC       dht-shared.lo
  CC       dht-lock.lo
  CC       libxlator.lo
  CC       dht.lo
  CC       nufa.lo
  CC       switch.lo
  CC       tier.lo
  CC       tier-common.lo
  CCLD     dht.la
  CCLD     switch.la
  CCLD     nufa.la
  CCLD     tier.la
make[5]: Nothing to be done for 'all-am'.
Making all in ec
Making all in src
  CC       ec.lo
  CC       ec-data.lo
  CC       ec-helpers.lo
  CC       ec-common.lo
  CC       ec-generic.lo
  CC       ec-locks.lo
  CC       ec-dir-read.lo
  CC       ec-dir-write.lo
  CC       ec-inode-read.lo
  CC       ec-inode-write.lo
  CC       ec-combine.lo
  CC       ec-method.lo
  CC       ec-galois.lo
  CC       ec-code.lo
  CC       ec-code-c.lo
  CC       ec-gf8.lo
  CC       ec-heal.lo
  CC       ec-heald.lo
  CC       libxlator.lo
  CCLD     ec.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in storage
Making all in posix
Making all in src
  CC       posix.lo
  CC       posix-helpers.lo
  CC       posix-handle.lo
  CC       posix-aio.lo
  CC       posix-gfid-path.lo
  CC       posix-entry-ops.lo
  CC       posix-inode-fd-ops.lo
  CC       posix-common.lo
  CC       posix-metadata.lo
  CCLD     posix.la
/usr/bin/ld: .libs/posix-inode-fd-ops.o: in function `posix_do_chmod':
/root/glusterfs/xlators/storage/posix/src/posix-inode-fd-ops.c:198: warning: lchmod is not implemented and will always fail
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in protocol
Making all in auth
Making all in addr
Making all in src
  CC       addr.lo
  CCLD     addr.la
make[6]: Nothing to be done for 'all-am'.
Making all in login
Making all in src
  CC       login.lo
  CCLD     login.la
make[6]: Nothing to be done for 'all-am'.
make[5]: Nothing to be done for 'all-am'.
Making all in client
Making all in src
  CC       client.lo
  CC       client-helpers.lo
  CC       client-rpc-fops.lo
  CC       client-handshake.lo
  CC       client-callback.lo
  CC       client-lk.lo
  CC       client-common.lo
  CC       client-rpc-fops_v2.lo
  CCLD     client.la
make[5]: Nothing to be done for 'all-am'.
Making all in server
Making all in src
  CC       server.lo
  CC       server-resolve.lo
  CC       server-helpers.lo
  CC       server-rpc-fops.lo
  CC       server-handshake.lo
  CC       authenticate.lo
  CC       server-common.lo
  CC       server-rpc-fops_v2.lo
  CCLD     server.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in performance
Making all in write-behind
Making all in src
  CC       write-behind.lo
  CCLD     write-behind.la
make[5]: Nothing to be done for 'all-am'.
Making all in read-ahead
Making all in src
  CC       read-ahead.lo
  CC       page.lo
  CCLD     read-ahead.la
make[5]: Nothing to be done for 'all-am'.
Making all in readdir-ahead
Making all in src
  CC       readdir-ahead.lo
  CCLD     readdir-ahead.la
make[5]: Nothing to be done for 'all-am'.
Making all in io-threads
Making all in src
  CC       io-threads.lo
  CCLD     io-threads.la
make[5]: Nothing to be done for 'all-am'.
Making all in io-cache
Making all in src
  CC       io-cache.lo
  CC       page.lo
  CC       ioc-inode.lo
  CCLD     io-cache.la
make[5]: Nothing to be done for 'all-am'.
Making all in symlink-cache
Making all in src
  CC       symlink-cache.lo
  CCLD     symlink-cache.la
make[5]: Nothing to be done for 'all-am'.
Making all in quick-read
Making all in src
  CC       quick-read.lo
  CCLD     quick-read.la
make[5]: Nothing to be done for 'all-am'.
Making all in md-cache
Making all in src
  CC       md-cache.lo
  CCLD     md-cache.la
make[5]: Nothing to be done for 'all-am'.
Making all in open-behind
Making all in src
  CC       open-behind.lo
  CCLD     open-behind.la
make[5]: Nothing to be done for 'all-am'.
Making all in decompounder
Making all in src
  CC       decompounder.lo
  CCLD     decompounder.la
make[5]: Nothing to be done for 'all-am'.
Making all in nl-cache
Making all in src
  CC       nl-cache.lo
  CC       nl-cache-helper.lo
  CCLD     nl-cache.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in debug
Making all in error-gen
Making all in src
  CC       error-gen.lo
  CCLD     error-gen.la
make[5]: Nothing to be done for 'all-am'.
Making all in io-stats
Making all in src
  CC       io-stats.lo
  CCLD     io-stats.la
make[5]: Nothing to be done for 'all-am'.
Making all in sink
Making all in src
  CC       sink.lo
  CCLD     sink.la
make[5]: Nothing to be done for 'all-am'.
Making all in trace
Making all in src
  CC       trace.lo
  CCLD     trace.la
make[5]: Nothing to be done for 'all-am'.
Making all in delay-gen
Making all in src
  CC       delay-gen.lo
  CCLD     delay-gen.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in features
Making all in locks
Making all in src
  CC       common.lo
  CC       posix.lo
  CC       entrylk.lo
  CC       inodelk.lo
  CC       reservelk.lo
  CC       clear.lo
  CCLD     locks.la
make[5]: Nothing to be done for 'all-am'.
Making all in quota
Making all in src
  CC       quota.lo
  CC       quota-enforcer-client.lo
  CC       quotad.lo
  CC       quotad-helpers.lo
  CC       quotad-aggregator.lo
  CCLD     quotad.la
  CCLD     quota.la
make[5]: Nothing to be done for 'all-am'.
Making all in read-only
Making all in src
  CC       read-only.lo
  CC       read-only-common.lo
  CC       worm-helper.lo
  CC       worm.lo
  CCLD     read-only.la
  CCLD     worm.la
make[5]: Nothing to be done for 'all-am'.
Making all in quiesce
Making all in src
  CC       quiesce.lo
  CCLD     quiesce.la
make[5]: Nothing to be done for 'all-am'.
Making all in marker
Making all in src
  CC       marker.lo
  CC       marker-quota.lo
  CC       marker-common.lo
  CC       marker-quota-helper.lo
  CCLD     marker.la
make[5]: Nothing to be done for 'all-am'.
Making all in index
Making all in src
  CC       index.lo
  CCLD     index.la
make[5]: Nothing to be done for 'all-am'.
Making all in barrier
Making all in src
  CC       barrier.lo
  CCLD     barrier.la
make[5]: Nothing to be done for 'all-am'.
Making all in arbiter
Making all in src
  CC       arbiter.lo
  CCLD     arbiter.la
make[5]: Nothing to be done for 'all-am'.
Making all in compress
Making all in src
  CC       cdc.lo
  CC       cdc-helper.lo
  CCLD     cdc.la
make[5]: Nothing to be done for 'all-am'.
Making all in changelog
Making all in src
  CC       changelog.lo
  CC       changelog-rt.lo
  CC       changelog-helpers.lo
  CC       changelog-encoders.lo
  CC       changelog-rpc.lo
  CC       changelog-barrier.lo
  CC       changelog-rpc-common.lo
  CC       changelog-ev-handle.lo
  CCLD     changelog.la
Making all in lib
Making all in src
  CC       libgfchangelog_la-gf-changelog.lo
  CC       libgfchangelog_la-gf-changelog-helpers.lo
  CC       libgfchangelog_la-gf-changelog-journal-handler.lo
  CC       libgfchangelog_la-gf-changelog-api.lo
  CC       libgfchangelog_la-gf-history-changelog.lo
  CC       libgfchangelog_la-gf-changelog-rpc.lo
  CC       libgfchangelog_la-gf-changelog-reborp.lo
  CC       libgfchangelog_la-changelog-rpc-common.lo
  CCLD     libgfchangelog.la
make[6]: Nothing to be done for 'all-am'.
make[5]: Nothing to be done for 'all-am'.
Making all in changetimerecorder
Making all in src
  CC       changetimerecorder.lo
  CC       ctr-helper.lo
  CC       ctr-xlator-ctx.lo
  CCLD     changetimerecorder.la
make[5]: Nothing to be done for 'all-am'.
Making all in gfid-access
Making all in src
  CC       gfid-access.lo
  CCLD     gfid-access.la
make[5]: Nothing to be done for 'all-am'.
Making all in glupy
Making all in src
Making all in glupy
make[6]: Nothing to be done for 'all'.
  CC       glupy.lo
  CCLD     glupy.la
Making all in examples
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
Making all in upcall
Making all in src
  CC       upcall.lo
  CC       upcall-internal.lo
  CCLD     upcall.la
make[5]: Nothing to be done for 'all-am'.
Making all in snapview-client
Making all in src
  CC       snapview-client.lo
  CCLD     snapview-client.la
make[5]: Nothing to be done for 'all-am'.
Making all in snapview-server
Making all in src
  CC       snapview-server.lo
  CC       snapview-server-mgmt.lo
  CC       snapview-server-helpers.lo
  CCLD     snapview-server.la
make[5]: Nothing to be done for 'all-am'.
Making all in trash
Making all in src
  CC       trash.lo
  CCLD     trash.la
make[5]: Nothing to be done for 'all-am'.
Making all in shard
Making all in src
  CC       shard.lo
  CCLD     shard.la
make[5]: Nothing to be done for 'all-am'.
Making all in bit-rot
Making all in src
Making all in stub
  CC       bit-rot-stub-helpers.lo
  CC       bit-rot-stub.lo
  CCLD     bitrot-stub.la
Making all in bitd
  CC       bit-rot.lo
  CC       bit-rot-scrub.lo
  CC       bit-rot-ssm.lo
  CC       bit-rot-scrub-status.lo
  CCLD     bit-rot.la
make[6]: Nothing to be done for 'all-am'.
make[5]: Nothing to be done for 'all-am'.
Making all in leases
Making all in src
  CC       leases.lo
  CC       leases-internal.lo
  CCLD     leases.la
make[5]: Nothing to be done for 'all-am'.
Making all in selinux
Making all in src
  CC       selinux.lo
  CCLD     selinux.la
make[5]: Nothing to be done for 'all-am'.
Making all in sdfs
Making all in src
  CC       sdfs.lo
  CCLD     sdfs.la
make[5]: Nothing to be done for 'all-am'.
Making all in namespace
Making all in src
  CC       namespace.lo
  CCLD     namespace.la
make[5]: Nothing to be done for 'all-am'.
Making all in thin-arbiter
Making all in src
  CC       thin-arbiter.lo
  CC       libxlator.lo
  CCLD     thin-arbiter.la
make[5]: Nothing to be done for 'all-am'.
Making all in utime
Making all in src
/usr/bin/python3 ../../../../xlators/features/utime/src/utime-gen-fops-h.py ../../../../xlators/features/utime/src/utime-autogen-fops-tmpl.h > utime-autogen-fops.h
make --no-print-directory all-am
  CC       utime-helpers.lo
/usr/bin/python3 ../../../../xlators/features/utime/src/utime-gen-fops-c.py ../../../../xlators/features/utime/src/utime-autogen-fops-tmpl.c > utime-autogen-fops.c
  CC       utime.lo
  CC       utime-autogen-fops.lo
  CCLD     utime.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in encryption
Making all in rot-13
Making all in src
  CC       rot-13.lo
  CCLD     rot-13.la
make[5]: Nothing to be done for 'all-am'.
Making all in crypt
Making all in src
  CC       keys.lo
  CC       data.lo
  CC       metadata.lo
  CC       atom.lo
  CC       crypt.lo
  CCLD     crypt.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in mount
Making all in fuse
Making all in src
  CC       fuse-helpers.lo
  CC       fuse-resolve.lo
  CC       fuse-bridge.lo
  CC       misc.lo
  CC       mount.lo
  CC       mount-common.lo
  CCLD     fuse.la
Making all in utils
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in mgmt
Making all in glusterd
Making all in src
  CC       glusterd_la-glusterd.lo
  CC       glusterd_la-glusterd-handler.lo
  CC       glusterd_la-glusterd-sm.lo
  CC       glusterd_la-glusterd-op-sm.lo
  CC       glusterd_la-glusterd-utils.lo
  CC       glusterd_la-glusterd-rpc-ops.lo
  CC       glusterd_la-glusterd-store.lo
  CC       glusterd_la-glusterd-handshake.lo
  CC       glusterd_la-glusterd-pmap.lo
  CC       glusterd_la-glusterd-volgen.lo
  CC       glusterd_la-glusterd-rebalance.lo
  CC       glusterd_la-glusterd-quota.lo
  CC       glusterd_la-glusterd-bitrot.lo
  CC       glusterd_la-glusterd-geo-rep.lo
  CC       glusterd_la-glusterd-replace-brick.lo
  CC       glusterd_la-glusterd-log-ops.lo
  CC       glusterd_la-glusterd-tier.lo
  CC       glusterd_la-glusterd-volume-ops.lo
  CC       glusterd_la-glusterd-brick-ops.lo
  CC       glusterd_la-glusterd-mountbroker.lo
  CC       glusterd_la-glusterd-syncop.lo
  CC       glusterd_la-glusterd-hooks.lo
  CC       glusterd_la-glusterd-volume-set.lo
  CC       glusterd_la-glusterd-locks.lo
  CC       glusterd_la-glusterd-snapshot.lo
  CC       glusterd_la-glusterd-mgmt-handler.lo
  CC       glusterd_la-glusterd-mgmt.lo
  CC       glusterd_la-glusterd-peer-utils.lo
  CC       glusterd_la-glusterd-statedump.lo
  CC       glusterd_la-glusterd-snapshot-utils.lo
  CC       glusterd_la-glusterd-conn-mgmt.lo
  CC       glusterd_la-glusterd-proc-mgmt.lo
  CC       glusterd_la-glusterd-svc-mgmt.lo
  CC       glusterd_la-glusterd-shd-svc.lo
  CC       glusterd_la-glusterd-nfs-svc.lo
  CC       glusterd_la-glusterd-quotad-svc.lo
  CC       glusterd_la-glusterd-svc-helper.lo
  CC       glusterd_la-glusterd-conn-helper.lo
  CC       glusterd_la-glusterd-snapd-svc.lo
  CC       glusterd_la-glusterd-snapd-svc-helper.lo
  CC       glusterd_la-glusterd-bitd-svc.lo
  CC       glusterd_la-glusterd-scrub-svc.lo
  CC       glusterd_la-glusterd-server-quorum.lo
  CC       glusterd_la-glusterd-reset-brick.lo
  CC       glusterd_la-glusterd-tierd-svc.lo
  CC       glusterd_la-glusterd-tierd-svc-helper.lo
  CC       glusterd_la-glusterd-gfproxyd-svc.lo
  CC       glusterd_la-glusterd-gfproxyd-svc-helper.lo
  CCLD     glusterd.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in system
Making all in posix-acl
Making all in src
  CC       posix-acl.lo
  CC       posix-acl-xattr.lo
  CCLD     posix-acl.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in playground
Making all in template
Making all in src
  CC       template.lo
  CCLD     template.la
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in meta
Making all in src
  CC       meta.lo
  CC       meta-helpers.lo
  CC       meta-defaults.lo
  CC       graphs-dir.lo
  CC       root-dir.lo
  CC       frames-file.lo
  CC       graph-dir.lo
  CC       active-link.lo
  CC       xlator-dir.lo
  CC       top-link.lo
  CC       logging-dir.lo
  CC       logfile-link.lo
  CC       loglevel-file.lo
  CC       process_uuid-file.lo
  CC       volfile-file.lo
  CC       view-dir.lo
  CC       subvolumes-dir.lo
  CC       subvolume-link.lo
  CC       type-file.lo
  CC       version-file.lo
  CC       options-dir.lo
  CC       option-file.lo
  CC       cmdline-file.lo
  CC       name-file.lo
  CC       private-file.lo
  CC       history-file.lo
  CC       mallinfo-file.lo
  CC       meminfo-file.lo
  CC       measure-file.lo
  CC       profile-file.lo
  CCLD     meta.la
make[4]: Nothing to be done for 'all-am'.
Making all in experimental
Making all in jbr-client
Making all in src
/usr/bin/python3 ../../../../xlators/experimental/jbr-client/src/gen-fops.py ../../../../xlators/experimental/jbr-client/src/fop-template.c ../../../../xlators/experimental/jbr-client/src/jbrc.c > jbrc-cg.c
  CC       jbrc-cg.lo
  CCLD     jbrc.la
make[5]: Nothing to be done for 'all-am'.
Making all in jbr-server
Making all in src
/usr/bin/python3 ../../../../xlators/experimental/jbr-server/src/gen-fops.py ../../../../xlators/experimental/jbr-server/src/all-templates.c ../../../../xlators/experimental/jbr-server/src/jbr.c > jbr-cg.c
  CC       jbr-cg.lo
  CCLD     jbr.la
make[5]: Nothing to be done for 'all-am'.
Making all in fdl
Making all in src
/usr/bin/python3 ./gen_dumper.py ./dump-tmpl.c > libfdl.c
  CC       logdump.o
  CC       recon.o
/usr/bin/python3 ./gen_recon.py ./recon-tmpl.c > librecon.c
/usr/bin/python3 ./gen_fdl.py ./fdl-tmpl.c > fdl.c
  CC       libfdl.o
  CC       fdl.lo
  CC       librecon.o
  CCLD     gf_logdump
  CCLD     gf_recon
  CCLD     fdl.la
make[5]: Nothing to be done for 'all-am'.
Making all in dht2
Making all in dht2-client
Making all in src
  CC       dht2-client-main.lo
  CC       dht2-common-map.lo
  CCLD     dht2c.la
make[6]: Nothing to be done for 'all-am'.
Making all in dht2-server
Making all in src
  CC       dht2-server-main.lo
  CC       dht2-common-map.lo
  CCLD     dht2s.la
make[6]: Nothing to be done for 'all-am'.
make[5]: Nothing to be done for 'all-am'.
Making all in posix2
Making all in common
Making all in src
  CC       libposix2common_la-posix2-common.lo
  CCLD     libposix2common.la
make[6]: Nothing to be done for 'all-am'.
Making all in mds
Making all in src
  CC       posix2-mds-main.lo
  CCLD     posix2-mds.la
make[6]: Nothing to be done for 'all-am'.
Making all in ds
Making all in src
  CC       posix2-ds-main.lo
  CCLD     posix2-ds.la
make[6]: Nothing to be done for 'all-am'.
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
make[3]: Nothing to be done for 'all-am'.
Making all in glusterfsd
Making all in src
  CC       glusterfsd.o
  CC       glusterfsd-mgmt.o
  CC       gf_attach.o
  CCLD     gf_attach
  CCLD     glusterfsd
make[3]: Nothing to be done for 'all-am'.
Making all in contrib/fuse-util
  CC       fusermount.o
  CC       mount_util.o
  CC       mount-common.o
  CCLD     fusermount-glusterfs
Making all in doc
make[2]: Nothing to be done for 'all'.
Making all in extras
Making all in init.d
make[3]: Nothing to be done for 'all'.
Making all in systemd
make[3]: Nothing to be done for 'all'.
Making all in benchmarking
make[3]: Nothing to be done for 'all'.
Making all in hook-scripts
Making all in add-brick
Making all in post
make[5]: Nothing to be done for 'all'.
Making all in pre
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
Making all in create
Making all in post
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
Making all in delete
Making all in pre
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
Making all in set
Making all in post
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
Making all in start
Making all in post
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
Making all in stop
Making all in pre
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
Making all in reset
Making all in post
make[5]: Nothing to be done for 'all'.
Making all in pre
make[5]: Nothing to be done for 'all'.
make[5]: Nothing to be done for 'all-am'.
make[4]: Nothing to be done for 'all-am'.
Making all in ocf
make[3]: Nothing to be done for 'all'.
Making all in LinuxRPM
To build RPMS run 'make glusterrpms'
Making all in geo-rep
  CC       gsync_sync_gfid-gsync-sync-gfid.o
  CCLD     gsync-sync-gfid
Making all in snap_scheduler
make[3]: Nothing to be done for 'all'.
Making all in firewalld
make[3]: Nothing to be done for 'all'.
Making all in cliutils
make[3]: Nothing to be done for 'all'.
make[3]: Nothing to be done for 'all-am'.
Making all in cli
Making all in src
  CC       cli.o
  CC       registry.o
  CC       input.o
  CC       cli-cmd.o
  CC       cli-rl.o
  CC       cli-cmd-global.o
  CC       cli-cmd-volume.o
  CC       cli-cmd-peer.o
  CC       cli-rpc-ops.o
  CC       cli-cmd-parser.o
  CC       cli-cmd-system.o
  CC       cli-cmd-misc.o
  CC       cli-xml-output.o
  CC       cli-quotad-client.o
  CC       cli-cmd-snapshot.o
  CCLD     gluster
/usr/bin/ld: warning: libtinfo.so.5, needed by /usr/lib/gcc/arm-linux-gnueabihf/8/../../../arm-linux-gnueabihf/libreadline.so, may conflict with libtinfo.so.6
make[3]: Nothing to be done for 'all-am'.
Making all in heal
Making all in src
  CC       glfs-heal.o
  CCLD     glfsheal
/usr/bin/ld: warning: libtinfo.so.5, needed by /usr/lib/gcc/arm-linux-gnueabihf/8/../../../arm-linux-gnueabihf/libreadline.so, may conflict with libtinfo.so.6
make[3]: Nothing to be done for 'all-am'.
Making all in geo-replication
Making all in syncdaemon
make[3]: Nothing to be done for 'all'.
Making all in src
  CC       gsyncd.o
  CC       procdiggy.o
  CCLD     gsyncd
make[3]: Nothing to be done for 'all-am'.
Making all in tools
Making all in gfind_missing_files
  CC       gcrawler.o
  CCLD     gcrawler
Making all in glusterfind
Making all in src
make[4]: Nothing to be done for 'all'.
make[4]: Nothing to be done for 'all-am'.
Making all in setgfid2path
Making all in src
  CC       main.o
  CCLD     gluster-setgfid2path
make[4]: Nothing to be done for 'all-am'.
make[3]: Nothing to be done for 'all-am'.
Making all in events
Making all in src
/usr/bin/python3 ../../events/eventskeygen.py PY_HEADER
make --no-print-directory all-am
make[4]: Nothing to be done for 'all-am'.
Making all in tools
make[3]: Nothing to be done for 'all'.
make[3]: Nothing to be done for 'all-am'.

But I encounter a segfault when I try to run glusterd:

# /usr/local/sbin/glusterd -N
Segmentation fault (core dumped)

In the log, I have the following lines:

[2018-09-26 20:07:40.146049] I [MSGID: 100030] [glusterfsd.c:2691:main] 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd version 6dev (args: /usr/local/sbin/glusterd -N)
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-09-26 20:07:40
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.3
---------

When I try to display a backtrace from GDB, I get that:

# gdb -ex=r --args /usr/local/sbin/glusterd -N
GNU gdb (Debian 8.1-4) 8.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/glusterd...done.
Starting program: /usr/local/sbin/glusterd -N
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0xb6c7a766 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
(gdb) bt
#0  0xb6c7a766 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
#1  0xb6c76100 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

You can find the core file here: core.zip

@xhernandez
Copy link
Contributor

Can you try to install debug symbols for libcrypto.so.1.1 and retry the backtrace ?

@superboum
Copy link

superboum commented Sep 27, 2018

Sure:

Starting program: /usr/local/sbin/glusterd -N
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
_armv7_tick () at crypto/armv4cpuid.o.s:112
112	crypto/armv4cpuid.o.s: No such file or directory.
(gdb) bt
#0  _armv7_tick () at crypto/armv4cpuid.o.s:112
#1  0xb6c76100 in OPENSSL_cpuid_setup () at ../crypto/armcap.c:186
#2  0xb6fe1620 in call_init (l=<optimized out>, argc=argc@entry=2, argv=argv@entry=0xbefffd14, env=env@entry=0xbefffd20) at dl-init.c:72
#3  0xb6fe16d8 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=<optimized out>) at dl-init.c:30
#4  _dl_init (main_map=0xb6fff970, argc=2, argv=0xbefffd14, env=0xbefffd20) at dl-init.c:119
#5  0xb6fd6bc4 in _dl_start_user () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

For your information:

# apt show libssl1.1
Package: libssl1.1
Version: 1.1.0h-4
APT-Sources: http://ftp.fr.debian.org/debian testing/main armhf Packages

It might be an error independent from your patch. If you think so, I plan to try to compile/install GlusterFS from the master branch and look at Debian specific patches, but I don't know when I will have time to test that.

@xhernandez
Copy link
Contributor

This seems to me as an issue with crypto library. It seems that it's using an illegal instruction when it tries to detect processor capabilities.

Does it work with same configuration but without the patch ?

@superboum
Copy link

superboum commented Oct 2, 2018

Ok, so it appears that's a normal behavior: When debugging I observe SIGILL during OpenSSL initialization: why?
According to this SO post, I should use handle SIGILL nostop. I will keep you updated.


Edit: the real error:

# gdb --args /usr/local/sbin/glusterd --debug
GNU gdb (Debian 8.1-4) 8.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/glusterd...done.
(gdb) handle SIGILL pass nostop noprint
Signal        Stop	Print	Pass to program	Description
SIGILL        No	No	Yes		Illegal instruction
(gdb) r
Starting program: /usr/local/sbin/glusterd --debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[2018-10-02 09:18:06.589131] I [MSGID: 100030] [glusterfsd.c:2691:main] 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd version 6dev (args: /usr/local/sbin/glusterd --debug)
[2018-10-02 09:18:06.589405] D [logging.c:1821:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[New Thread 0xb5b73410 (LWP 15668)]
[New Thread 0xb5372410 (LWP 15669)]
[New Thread 0xb4b71410 (LWP 15670)]
[New Thread 0xb4370410 (LWP 15671)]
[New Thread 0xb3b6f410 (LWP 15672)]
[2018-10-02 09:18:06.600465] D [MSGID: 0] [glusterfsd.c:747:get_volfp] 0-glusterfsd: loading volume file /usr/local/etc/glusterfs/glusterd.vol
[2018-10-02 09:18:08.101343] D [MSGID: 101097] [xlator.c:334:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/lib/arm-linux-gnueabihf/glusterfs/4.1.5/xlator/mgmt/glusterd.so: undefined symbol: xlator_api. Fall back to old symbols

Thread 1 "glusterd" received signal SIGSEGV, Segmentation fault.
0xb6f0c2ce in gf_add_cmdline_options (graph=<optimized out>, cmd_args=0x427150) at graph.c:299
299	graph.c: No such file or directory.
(gdb) bt
#0  0xb6f0c2ce in gf_add_cmdline_options (graph=<optimized out>, cmd_args=0x427150) at graph.c:299
#1  glusterfs_graph_prepare (graph=<optimized out>, ctx=0x427150, volume_name=<optimized out>) at graph.c:588
#2  0x004095f2 in glusterfs_process_volfp (ctx=0x427150, fp=0x45c7e0) at glusterfsd.c:2523
#3  0x0040971a in glusterfs_volumes_init (ctx=0x427150) at glusterfsd.c:2581
#4  0x00409cce in main (argc=2, argv=0xbefffd04) at glusterfsd.c:2734

It seems I have this error (SIGSEGV on gf_add_cmdline_options) with and without your patch, so another thing to investigate from my side. I will update this post when I will know why.


Edit 2: I changed strategy. I tried to build glusterfs in a Docker container to prevent any interference from past installations. And I don't have the error anymore. I still need to test erasure coding. I will post all the logs when I will have done all the tests.

Dockerfile
FROM arm32v7/debian:buster

RUN apt-get update && \
    apt-get install -y \
      autotools-dev \
      libfuse-dev \
      libibverbs-dev \
      libdb-dev \
      librdmacm-dev \
      libaio-dev \
      libacl1-dev \
      libsqlite3-dev \
      liburcu-dev \
      uuid-dev \
      liblvm2-dev \
      attr \
      flex \
      bison \
      libreadline-dev \
      libncurses5-dev \
      libglib2.0-dev \
      libssl-dev \
      libxml2-dev \
      pkg-config \
      dh-python \
      python-all-dev \
      build-essential \
      git \
      wget \
      autoconf \
      libtool \
      gdb

WORKDIR /opt

RUN wget https://review.gluster.org/changes/21276/revisions/6baeb147c19c1f9f29552eebf98b33e4442e8a31/archive?format=tgz -O glusterfs.tgz

RUN tar xzvf glusterfs.tgz

RUN ./autogen.sh

RUN ./configure --enable-debug

RUN make -j5

RUN make install

RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/local.conf && ldconfig

@superboum
Copy link

superboum commented Oct 3, 2018

You patch seems to fix the erasure coding bug I encountered

Here is my test protocol:

I created a Dockerfile that I built with docker build -t superboum/glusterbuild on a Scaleway C1 server:

Dockerfile
FROM arm32v7/debian:buster

RUN apt-get update && \
    apt-get install -y \
      autotools-dev \
      libfuse-dev \
      libibverbs-dev \
      libdb-dev \
      librdmacm-dev \
      libaio-dev \
      libacl1-dev \
      libsqlite3-dev \
      liburcu-dev \
      uuid-dev \
      liblvm2-dev \
      attr \
      flex \
      bison \
      libreadline-dev \
      libncurses5-dev \
      libglib2.0-dev \
      libssl-dev \
      libxml2-dev \
      pkg-config \
      dh-python \
      python-all-dev \
      build-essential \
      git \
      wget \
      autoconf \
      libtool \
      gdb

WORKDIR /opt

RUN wget https://review.gluster.org/changes/21276/revisions/6baeb147c19c1f9f29552eebf98b33e4442e8a31/archive?format=tgz -O glusterfs.tgz

RUN tar xzvf glusterfs.tgz

RUN ./autogen.sh

RUN ./configure --enable-debug

RUN make -j5

RUN make install

RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/local.conf && ldconfig

After that I have started a gluster daemon:

docker run  --privileged=true -ti superboum/glusterbuild
/usr/local/sbin/glusterd --debug

And in a second terminal, I got a shell in the container to run the following tests which worked:

docker exec -t -i b4697a3b04de bash
mkdir /srv/g{1,2,3}
gluster volume create test-erasure disperse 3 redundancy 1 transport tcp 172.17.0.2:/srv/g1 172.17.0.2:/srv/g2 172.17.0.2:/srv/g3 force
gluster volume start test-erasure
mkdir /mnt/glerasure
mount -t glusterfs 127.0.0.1:/test-erasure /mnt/glerasure/
gluster volume info test-erasure
# Output:
# Volume Name: test-erasure
# Type: Disperse
# Volume ID: b7bab530-b537-48e0-bf81-ab7d361fed00
# Status: Started
# Snapshot Count: 0
# Number of Bricks: 1 x (2 + 1) = 3
# Transport-type: tcp
# Bricks:
# Brick1: 172.17.0.2:/srv/g1
# Brick2: 172.17.0.2:/srv/g2
# Brick3: 172.17.0.2:/srv/g3
# Options Reconfigured:
# transport.address-family: inet
# nfs.disable: on
# 
cd /mnt/glerasure
echo world > hello # it worked
cat hello # it worked

You can find the whole log of the compilation here (including some warnings that could interest you): screen.log

@xhernandez
Copy link
Contributor

That's great :)

It seems that there are still some warnings in the compilation, though they don't seem dangerous. I'll update the patch to also remove them.

@stale
Copy link

stale bot commented Apr 30, 2020

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added wontfix Managed by stale[bot] and removed wontfix Managed by stale[bot] labels Apr 30, 2020
@AlscadIngenierie AlscadIngenierie changed the title Error when i edit a file in a Gluster distributed volume Gluster distributed volume May 1, 2020
@stale
Copy link

stale bot commented Nov 27, 2020

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Nov 27, 2020
@stale
Copy link

stale bot commented Dec 12, 2020

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this as completed Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

6 participants