-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory consumption depending on volume bricks count #23
Comments
The python bindings are a very thin wrapper around the libgfapi C library. The client connects to each brick and 600 of them is a lot of bricks. There will be one instance of client xlator (a shared object loaded) per brick. You'll find some more information in this related issue: gluster/glusterfs#325 |
@prashanthpai, it's not about python binding, it's about usage of gfapi, either via native C or via python. FUSE is a singleton within a system, but gfapi is instantiated within a process and may be even many times, so if FUSE takes 0.5G of memory just once, then N instances of gfapi multiply memory usage by N. In case of 2000 * 0,5G = around 1TB of RAM (not drives).
Is that all because gfapi implementation is derived from FUSE native client what entailed unnecessary stuff from there? So I'd kindly ask to comment on my concerns, so we cloud make a right decision whether and how to use gfapi or even GlusterFS in general? |
Although that may be a valid usecase, gfapi instances are isolated and cannot be shared across processes unlike a FUSE mount. Usually gfapi consumers (NFS ganesha, QEMU, SMB) will have only one instance per volume per process through-out the lifetime of the process. Improvements have been made to make the client light-weight (less "fat" as you pointed out). See gluster/glusterfs#242. With that model, the client stack is split into two - a thin client and a proxy daemon. You'll have a very thin client residing in each process and one local daemon per machine(s) which will talk to the bricks. This is a tradeoff which adds one additional hop. You can try that out.
Kindly note that libgfapi instances do not talk to FUSE. They maintain inode table in memory and cache metadata for faster access.
gfapi implementation isn't derived from FUSE, although they share some code. I'm cc'ing libgfapi maintainers to see if they can better answer your queries and help you with other recommendations. |
We've obtained very high memory usage produced by gfapi.Volume when mounted to big volume (with large bricks count). There are few experiment results, showing memory used by python process mounted to different envs:
Before mount (VSZ / RSS): 212376 / 8932
(2 nodes) 12 bricks volume : 631644 / 21440
(6 nodes) 384 bricks: 861648 / 276516
(10 nodes) 600 bricks: 987116 / 432028
Almost half GB per process just on start! And even more when actively used. As we are planning to run near 100 client nodes with 50 processes per node, amount of memory needed becomes fantastic.
Is there any reason for gfapi to use so much memory to just mount the volume?
Does that mean that server-side scaling up requires corresponding scaling up of client side?
The text was updated successfully, but these errors were encountered: