-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glusterfs brick processes are running very high CPU #2671
Comments
Any know issue with high CPU for release 7.8+? |
I don't think any issue was reported in release-7.8+ specific to high CPU consumption, Can you please confirm what is workload in your environment ? Is it possible to capture continuous pstack while a brick process is consuming high CPU and share the brick logs also to debug it more. |
Thanks Mohit -- this is a customer system, so I'll ask to collect the data in the lab which has the same issue. |
@mohit84 Thanks for looking into the issue. We got the data from the customer lab: gluster volume heal vol_bc456156109bd5bffe9d060d21d7265f infoBrick 172.27.2.32:/var/lib/heketi/mounts/vg_5f52c63bee2bb0909dc1c81ff40683e1/brick_9ddff301dec7365e0a3e820540ee3b2b/brick Brick 172.27.2.42:/var/lib/heketi/mounts/vg_7bc0c66ed7b35088cedfa56e802005e1/brick_dfefa1d7202794b7877a070d958b6b64/brick 2). For Brick-1, here's the pstack output: |
Thread 4 (Thread 0x7fb6a04b0700 (LWP 26766)): |
Thread 7 (Thread 0x7fb6a0573700 (LWP 26434)): |
###########################NEXT PSTACK DUMP####################################### |
###########################NEXT PSTACK DUMP####################################### |
3). For the 2nd brick -- here's the pstack dump while true; do pstack 15748 | tee pstack_2_dump.txt; echo "###########################NEXT PSTACK DUMP#######################################"| tee pstack_2_dump.txt ;sleep 5;done Thread 36 (Thread 0x7f67f3d85700 (LWP 15749)): |
###########################NEXT PSTACK DUMP####################################### |
@mohit84 For the logs -- they're two many -- I have them in zip files, How can I send them (no attachment here). Can I email directly to you? |
@mohit84 Any idea from the pstack output? |
I am not able to figure out from pstack , i believe high CPU usage should be momentarily while processing a fop request. |
@mohit84 .. thanks for looking at this. Yes application didn’t see that high cpu on 6.5. Can you share how to configure 1 event-poll thread ? |
gluster v set vol-name client.event-threads 1 |
Thanks @mohit84 will try that and let you know. Is there a way to make for all volumes? Something like a general configuration? |
No |
@mohit84 We tried setting those parameters to 1, but saw no drop in CPU -- any other thoughts? |
@xhernandez Can you suggest something how can we proceed ? |
do u have environment of 6.5, If you have can you please share pstack of a brick process while your application is consuming volume. |
@mohit84 Thread 29 (Thread 0x7f2d416ff700 (LWP 31285)): |
@mohit84 This is the pstack for the 2nd brick (release 6.5) |
@mohit84 We identified the issue. It's the way the application updating these files. I'm closing the ticket. Thanks for your support as usual. |
Description of problem:
We have a dynamic volume with 2 bricks processes (replica:2) running very high cpu exceeding 100%. The dynamic volume in K8s is used by two different pods. That was not the case with GlusterFS V6.5 -- when upgraded to release 7.8, we have seen this issue. This is a field issue so it's escalated.
The issue doesn't show with other pods/volumes. Collected the following profile data on the volume while load is running:
Appreciate your urgent attention:
gluster volume profile vol_7147326886af2706ee79f403c2998a3f start incremental;sleep 60
gluster volume profile vol_7147326886af2706ee79f403c2998a3f info
gluster volume profile vol_7147326886af2706ee79f403c2998a3f stop
gluster volume profile vol_7147326886af2706ee79f403c2998a3f start peek;sleep 60
gluster volume profile vol_7147326886af2706ee79f403c2998a3f info
gluster volume profile vol_7147326886af2706ee79f403c2998a3f stop
[root@vudmchcl01-control-01 MP1_lite]# gluster volume profile vol_7147326886af2706ee79f403c2998a3f start incremental
Starting volume profile on vol_7147326886af2706ee79f403c2998a3f has been successful
[root@vudmchcl01-control-01 MP1_lite]# sleep 65
[root@vudmchcl01-control-01 MP1_lite]# gluster volume profile vol_7147326886af2706ee79f403c2998a3f info
Brick: 172.27.2.35:/var/lib/heketi/mounts/vg_cedb863c562c710481e1bcd0ec691f42/brick_067a0b32180c9dd70d688c82ffaa6f6d/brick
Cumulative Stats:
Block Size: 2b+ 4b+ 8b+
No. of Reads: 1 2 2
No. of Writes: 208 528 828290
Block Size: 16b+ 32b+ 64b+
No. of Reads: 60 916 3309
No. of Writes: 472257 693024 2489610
Block Size: 128b+ 256b+ 512b+
No. of Reads: 18937 7293 32175
No. of Writes: 2305389 538291 537133
Block Size: 1024b+ 2048b+ 4096b+
No. of Reads: 13423 18690 9187
No. of Writes: 558496 518944 410859
Block Size: 8192b+ 16384b+ 32768b+
No. of Reads: 20841 14409 16682
No. of Writes: 241198 102657 24286
Block Size: 65536b+ 131072b+
No. of Reads: 16709 83562
No. of Writes: 1725 7
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
Data Read: 13984404450 bytes
Data Written: 11967854926 bytes
Interval 4 Stats:
Block Size: 8b+ 16b+ 32b+
No. of Reads: 0 0 1
No. of Writes: 235 117 189
Block Size: 64b+ 128b+ 256b+
No. of Reads: 2 8 9
No. of Writes: 772 637 121
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 25 12 9
No. of Writes: 129 167 118
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 3 10 6
No. of Writes: 98 58 23
Block Size: 32768b+ 65536b+ 131072b+
No. of Reads: 14 8 189
No. of Writes: 5 0 0
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
Data Read: 26437792 bytes
Data Written: 2771470 bytes
Brick: 172.27.2.22:/var/lib/heketi/mounts/vg_e2e8b43d72ff0c7b572a8c1cb6494913/brick_50cc3f6362af27c0003e2f994916f9a8/brick
Cumulative Stats:
Block Size: 2b+ 4b+ 8b+
No. of Reads: 2 3 3
No. of Writes: 208 528 828290
Block Size: 16b+ 32b+ 64b+
No. of Reads: 61 891 3386
No. of Writes: 472257 693024 2489610
Block Size: 128b+ 256b+ 512b+
No. of Reads: 18958 7148 31957
No. of Writes: 2305389 538291 537133
Block Size: 1024b+ 2048b+ 4096b+
No. of Reads: 13246 18522 9402
No. of Writes: 558496 518944 410858
Block Size: 8192b+ 16384b+ 32768b+
No. of Reads: 21209 14361 16881
No. of Writes: 241197 102657 24286
Block Size: 65536b+ 131072b+
No. of Reads: 16837 83932
No. of Writes: 1725 7
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
Data Read: 14055514436 bytes
Data Written: 11967837616 bytes
Interval 4 Stats:
Block Size: 8b+ 16b+ 32b+
No. of Reads: 0 0 0
No. of Writes: 235 117 189
Block Size: 64b+ 128b+ 256b+
No. of Reads: 5 15 4
No. of Writes: 772 637 121
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 24 14 4
No. of Writes: 129 167 118
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 7 20 13
No. of Writes: 98 58 23
Block Size: 32768b+ 65536b+ 131072b+
No. of Reads: 1 8 64
No. of Writes: 5 0 0
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
Data Read: 9743802 bytes
Data Written: 2771470 bytes
[root@vudmchcl01-control-01 MP1_lite]#
[root@vudmchcl01-control-01 MP1_lite]# gluster volume profile vol_7147326886af2706ee79f403c2998a3f stop
Stopping volume profile on vol_7147326886af2706ee79f403c2998a3f has been successful
[root@vudmchcl01-control-01 MP1_lite]#
[root@vudmchcl01-control-01 MP1_lite]# gluster volume info vol_7147326886af2706ee79f403c2998a3f
Volume Name: vol_7147326886af2706ee79f403c2998a3f
Type: Replicate
Volume ID: 88b830ec-7ba8-468d-a3ad-62aac69ba7cf
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.27.2.35:/var/lib/heketi/mounts/vg_cedb863c562c710481e1bcd0ec691f42/brick_067a0b32180c9dd70d688c82ffaa6f6d/brick
Brick2: 172.27.2.22:/var/lib/heketi/mounts/vg_e2e8b43d72ff0c7b572a8c1cb6494913/brick_50cc3f6362af27c0003e2f994916f9a8/brick
Options Reconfigured:
cluster.eager-lock: on
user.heketi.id: 7147326886af2706ee79f403c2998a3f
storage.health-check-timeout: 20
storage.health-check-interval: 60
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
[root@vudmchcl01-control-01 MP1_lite]#
[root@vudmchcl01-control-01 MP1_lite]# gluster volume status vol_7147326886af2706ee79f403c2998a3f
Status of volume: vol_7147326886af2706ee79f403c2998a3f
Gluster process TCP Port RDMA Port Online Pid
Brick 172.27.2.35:/var/lib/heketi/mounts/vg
_cedb863c562c710481e1bcd0ec691f42/brick_067
a0b32180c9dd70d688c82ffaa6f6d/brick 49207 0 Y 14281
Brick 172.27.2.22:/var/lib/heketi/mounts/vg
_e2e8b43d72ff0c7b572a8c1cb6494913/brick_50c
c3f6362af27c0003e2f994916f9a8/brick 49206 0 Y 28758
Self-heal Daemon on localhost N/A N/A Y 20275
Self-heal Daemon on 172.27.2.18 N/A N/A Y 18237
Self-heal Daemon on 172.27.2.22 N/A N/A Y 17829
Task Status of Volume vol_7147326886af2706ee79f403c2998a3f
There are no active volume tasks
[root@vudmchcl01-control-01 MP1_lite]#
[root@vudmchcl01-control-01 MP1_lite]# gluster volume heal vol_7147326886af2706ee79f403c2998a3f info
Brick 172.27.2.35:/var/lib/heketi/mounts/vg_cedb863c562c710481e1bcd0ec691f42/brick_067a0b32180c9dd70d688c82ffaa6f6d/brick
Status: Connected
Number of entries: 0
Brick 172.27.2.22:/var/lib/heketi/mounts/vg_e2e8b43d72ff0c7b572a8c1cb6494913/brick_50cc3f6362af27c0003e2f994916f9a8/brick
/logs/vudmcl01ch01/2021_07_24T04_00_00
Status: Connected
Number of entries: 1
[root@vudmchcl01-control-01 MP1_lite]#
Expected results:
Running at 10% CPU
- The operating system / glusterfs version: CentOs-7
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
The text was updated successfully, but these errors were encountered: