thin client feature #242

amarts · 2017-06-14T17:27:21Z

codename: gfproxy

Move the clustering logic to serverside, and make the client thin.

Some work in glusterd volgen path to create this new volume file.

poornimag · 2017-07-13T03:56:45Z

Current patch by facebook is [1] . There remains some changes to be discussed and decided in the glusterd part of changes, here are few points:

Currently, to start the gfroxy daemon has to be started manually from the command line(by specifying the gfproxy volfile), and so does the thin client. I think we need to start the gfproxy daemon automatically if gfproxy is set for the volume and --thin-client should be the mount option instead of providing the volfile.
Should, enabling the gfproxy for a volume be a volume option? or is it a cluster wide option? Or it need not be an option but enabled by default? I think since its the way forward, enabling by default makes sense, but just to have a valve, have a cluster wide option. Thoughts?
Currently one gfproxy-deamon can load only one volume, should there be one gfproxy daemon for per node/ per volume/ scale based on number of clients?
Not sure of the behaviour on client graph change - its definitely not dynamically changed. Supporting the dynamic graph change requires code changes in protocol-server, instead automatic restart of the gfproxy daemon sounds like a better option, as there is AHA and probable failover in the future. Thoughts?
Also make change in gfapi and FUSE to have reduced resource consumption(memory/threads) so that the client is thin from resource perspective as well.

@atinmu @pranithk @amarts @obnoxxx @kshlm your thoughts on the above questions?

Each of them can be worked as a patch on top of [1], unless the decision we arrive at, requires a major reversal of approaches in [1].

[1] https://review.gluster.org/#/c/16843

mchangir · 2017-07-13T05:13:49Z

Having a volume wide gfproxy as an option seems reasonable. Some volumes may be accessed by a large number of clients like for a common shared directory. Some volumes may be Virtual Machine backing store which could be accessed directly by the guest OS which could help mitigate latency otherwise introduced by the gfproxy mediator.
volume multiplexing anyone ? on the lines of brick-multiplexing
bundling up volume specific globals into a struct gfproxyvol and we'd be good to service multiple volumes by a single proxy process ... I think

amarts · 2017-07-13T11:06:26Z

To answer @poornimag's questions:

Ack! Starting by default makes sense.
Enabling by default would be good in upstream. But there should be an option at volume level to disable it to productize it.
@mchangir's comments are valid, we need to evaluate multiplexing here.
In first version, lets not bother about graph change IMO. lets restart the process (or re-attach) like brick in gfproxy process too.
Call to be taken later. Not a blocker IMO. In general changes in that line will help libgfapi usage better for sure.

Gave comment on the patch at the top level. GlusterD changes looked good in the patch.

pranithk · 2017-07-13T12:15:13Z

@amarts @mchangir Doing 3. with multi-plexing poses challenges for graph switch. If we go ahead with restart as per 4. then restarting the process which is serving all volumes will lead to full movement of clients from one machine to other which will be costly IMHO. So 3 & 4 go together for now.

I don't have any strong opinion on 1., 2. as long as there are options to opt-in/opt-out of thin-client.

@poornimag I am interested in 5. irrespective of when we will do them. This has impact in gluster-block as well. Do you have anything specific in mind? May be a separate github issue?

poornimag · 2017-07-27T14:14:01Z

1, 2 concludes. 3. we can evaluate, but no need to block on that.
4. Even if graph change is not implemented, atleast the part where the gfproxy daemon is restarted on client vol changes needs to be automated.
5. @pranithk Yes, i had done some initial work on this, for gfapi, i think its big enough and independent to be a different issue, will raise a new one.

poornimag · 2017-07-27T14:30:59Z

One more important thing to consider is, whether we want to restrict gfproxyd daemon to be run only on Trusted storage pool or should be allowed to run outside.

If we restrict gfproxyd to trusted storage pool, implementation is quite direct and simple. Glusterd can manage the daemon, start on enabling, restart on graph change, stop on disabling. Also can have multiple gfproxyd running for different volumes on the same node, and have portmapping framework of glusterd.

If we need to allow gfproxyd on non trusted storage pool, we need to put more thought on it. Glusterd cannot be used to manage the daemon w.r.t starting the daemon, and portmap (help end client identify the host and port of the gfproxyd to connect to). Any thoughts on this @atinmu @amarts @obnoxxx @nixpanic and others? One suggestion by @obnoxxx and @nixpanic was to have another meta daemon that runs outside trusted storage pool, and this daemon does the portmapping, starting, stopping and restarting of gfproxyd daemons. But this means a lot of code sharing/duplication between glusterd and meta daemon. This also means a lot more rpc connections for glusterd etc. Any thoughts?

atinmu · 2017-08-08T03:43:57Z

Can you explain the benefit of having this daemon run outside of TSP ?

poornimag · 2017-08-08T04:43:54Z

In environments like qemu, ( also block device/containers?) the clients run outside of trusted storage pool. If we run gfproxyd in trusted storage pool, the thin client each per VM will have to talk to gfproxyd and then another network hop to talk to bricks, which involves 2 network ops, as opposed to existing one network op(the thick client talks to bricks directly). Hence to not compromise a lot on performance and also have thin client, its good to have gfproxyd run on non trusted storage pools as well.

nixpanic · 2017-08-13T11:33:24Z

On Tue, Aug 08, 2017 at 03:44:08AM +0000, Atin Mukherjee wrote: Can you explain the benefit of having this daemon run outside of TSP ?

I think all environments where multiple client processes on the same system that are not co-located on the storage servers would benefit. There is no hard requirement to run Samba (one daemon, many forked processes) or QEMU (many processes) on a storage server. The memory consumption of these processes accumilates *really* fast when there are 100's of Samba clients connected, or ~50 VMs. The difference between memory usage when FUSE mounts are used, or gfapi integration is done, is immense. Currently gfapi is a good solution when few processes connect to a Gluster volume. But when the number of processes increase, it is likely better to use FUSE mounts and accept the reduced performance (and possibly functionality)... Or buy a few TB of RAM, of course. When GF-Proxy is available on these systems, the actual process that uses gfapi will be much smaller, and the shared proxy instance only needs the RAM for each Gluster Volume. This is a relatively common deployment option for oVirt integration. Not all hypervisors are part of the TSP, but would definitely benefit from accessing the VM-disk images through a local proxy. This way many more VMs can run per hypervisor. I think it applies to container hosts as well. Not all container hosts will be part of the TSP, but if multiple containers use the same Gluster volume, multiple FUSE mounts (one per container) will be started and all consume their large memory footprint. I'm not sure how common it is for containers to use the same Gluster Volume though.

raghavendrahg · 2017-09-07T09:21:09Z

Some more points after discussion with @poornimag :

How does client get the port on which gfproxy is running? Would there be a portmapper for gf-proxy? What is the input this portmapper takes?
Authentication - Should proxy authenticate clients or just act as a pass through and let bricks authenticate?
It looks like we need a master/gfproxy/server xlator which listens on a single port (If we are going multiplexing volume way) on behalf of all volumes gf-proxy is exporting.
- What programs this xlator should register? Note that currently client uses handshake, ping and Glusterfs programs from a brick. What is the functionality that this xlator needs apart from the functionality provided by protocol/server.
- I think inode and fdtables should be associated with each volume rather than having global data-structures. That way there is more concurrency.
- There will be some work for holding back fops till the volume/graph ready to serve these fops.
Providing fail-over (maybe through AHA) might open up its own challenges of how do we implement non-idempotent ops like mkdir (solutions like DRC might help).
Cleanup of fds and locks on bricks when thin client dies won't work seamlessly.
I think we at least need to load md-cache on thin-client for better performance of ops like ls -l. This also means we need upcall notifications to reach thin clients. upcall is also needed to invalidate VFS cache of thin client kernel.
Write behind cant be loaded on gf-proxy. Imagine following scenario when write-behind is loaded on gf-proxy:
- write w1 is cached in write-behind
- gf-proxy reboots and w1 is lost
- application does fsync. Now gfproxy don't have enough data whether to fail or succeed fsync.
Nameless lookups (both fresh and revalidate) in DHT are costly. With gf-proxy we can expect more number of nameless lookups than named lookups.
We need to test how seamlessly add-brick works with gf-proxy. Few years ago, gNFS (which has design similar to gf-proxy) had bugs where directory operations would fail post add brick. I think most of these bugs are fixed now, but better to verify once.

mchangir · 2017-09-08T14:15:05Z

About running gfproxy outside the TSP:

could we have a 'gluster proxypeer probe' command ?
This could be used to form a trusted proxy pool which the clients can use to failover to.
could we have a 'gluster proxypool register' command ?
This could be used to securely register the proxy pool/cluster with the TSP

Vol file distribution could be carried out as follows:

When a system, outside either of the pools, tries to mount the volume via the proxy nodes or the TSP with the 'mount_via_proxy' flag, it could be handed the 'proxy_client' flavored vol file, otherwise it could be handed over the 'direct_client' vol file, only if the system is connecting directly to the TSP.
When a system, within the proxy pool, tries to mount the volume, it could be handed over two vol files, the 'proxy' flavor vol file and the 'proxy_client' flavor vol files.

Does this seem sensible ?

@amarts @poornimag @pranithk @atinmu @nixpanic @obnoxxx @pranithk

poornimag · 2017-09-20T06:16:54Z

Sorry for breaking the continuity of the previous discussions, summing certain things from the previous comments, here is the status of decided approaches and the work done so far:

Done so far:

1- Glusterd manage gfproxyd, enable or disable gfproxyd based on volume set option.
One gfproxyd per volume per node is automatically started on enabling gfproxy option on that volume.
2 - AHA translator (By facebook) to handle IO continuity on restart of gfproxy daemon.
3 - Failover to other gfproxy daemons on different nodes when required, without breaking the IO.
4 - Restart of gfproxyd by glusterd, on client graph switch.

Design done and WIP:

5 - Dynamic graph switch without restarting the gfproxyd (obsoletes 4).
6 - Volume multiplexing in gfproxyd, allow gfpxoyd to be able to run multiple volume graphs on the same
daemon. With this, we may be able to stick to one gfproxyd per node(based on performance analysis),
which enable us to choose gfproxyd to be run on a standard port(lets say 24009) which inturn enable
us to run gfproxyd outside trusted storage pool(but gfproxyd can still be managed by gluster on the
trusted storage pool, on other nodes its admin job to start the daemon etc.).
7 - Decrease the resource consumption of gfapi and FUSE when it is mounted as thin client.

To Be Done:

8 - Reimplement, volgen in glusterd-2. Also implement interfaces for volume multiplexing in gfproxyd,
status of gfproxyd from glusterd etc.
9 - gfproxyd in active-active/active/passive setup?

Of these, we hope to finish 8, 7, 5 before feature freeze for Glusterd 4.0(Oct EOM?)

Summmary: Adds a new server-side daemon called gfproxyd & a new FUSE client called gfproxy-client Updates: #242 BUG: 1428063 Change-Id: I83210098d3a381922bc64fed1110ae1b76e6519f Tested-by: Shreyas Siravara <[email protected]> Reviewed-by: Kevin Vigor <[email protected]> Signed-off-by: Shreyas Siravara <[email protected]> Signed-off-by: Poornima G <[email protected]>

gluster-ant · 2017-10-10T11:10:27Z

A patch https://review.gluster.org/18048 has been posted that references this issue.
Commit message: gfproxyd: Let glusterd manage gfproxy daemon

gluster-ant · 2017-10-16T06:06:07Z

A patch https://review.gluster.org/18048 has been posted that references this issue.
Commit message: gfproxyd: Let glusterd manage gfproxy daemon

gluster-ant · 2017-10-18T09:24:42Z

A patch https://review.gluster.org/18048 has been posted that references this issue.
Commit message: gfproxyd: Let glusterd manage gfproxy daemon

Updates: #242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <[email protected]>

Summmary: Adds a new server-side daemon called gfproxyd & a new FUSE client called gfproxy-client Updates: gluster#242 BUG: 1428063 Change-Id: I83210098d3a381922bc64fed1110ae1b76e6519f Tested-by: Shreyas Siravara <[email protected]> Reviewed-by: Kevin Vigor <[email protected]> Signed-off-by: Shreyas Siravara <[email protected]> Signed-off-by: Poornima G <[email protected]>

Updates: gluster#242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <[email protected]>

gluster-ant · 2017-12-21T06:10:37Z

A patch https://review.gluster.org/19022 has been posted that references this issue.
Commit message: quiesce: add fallocate and seek fops

quiesce is useful in a gfproxy setup where if gfproxy machine goes down the fop would be replayed. Hence only added the fops which is supported by fuse layer to start with. With this patch, no behavior change is added (ie, volfile change etc). Just making sure to have the translator up-to-date so that if required we can consume it. Updates #242 Change-Id: Id3bf204f2ccd42c3ac8f88d85836ecb855703e02 Signed-off-by: Amar Tumballi <[email protected]> Signed-off-by: Poornima G <[email protected]>

Updates: #242 Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69 Signed-off-by: Poornima G <[email protected]>

gluster-ant · 2018-02-08T06:59:41Z

A patch https://review.gluster.org/19525 has been posted that references this issue.
Commit message: libgfapi: Add option for gfproxy mount

quiesce is useful in a gfproxy setup where if gfproxy machine goes down the fop would be replayed. Hence only added the fops which is supported by fuse layer to start with. With this patch, no behavior change is added (ie, volfile change etc). Just making sure to have the translator up-to-date so that if required we can consume it. Updates gluster#242 Change-Id: Id3bf204f2ccd42c3ac8f88d85836ecb855703e02 Signed-off-by: Amar Tumballi <[email protected]> Signed-off-by: Poornima G <[email protected]>

Updates: gluster#242 Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69 Signed-off-by: Poornima G <[email protected]>

amarts added CB: glusterd FA: Scalability Improvements FA: Usability & Supportability labels Jun 14, 2017

amarts added this to the Release 4.0 milestone Jun 14, 2017

poornimag mentioned this issue Jul 27, 2017

libgfapi: Support a way to indicate thin client mount #287

Closed

gluster deleted a comment from gluster-ant Sep 20, 2017

mscherer pushed a commit that referenced this issue Oct 18, 2017

gfproxyd: Let glusterd manage gfproxy daemon

77271e9

Updates: #242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <[email protected]>

amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Oct 31, 2017

gfproxyd: Let glusterd manage gfproxy daemon

76ec13d

Updates: gluster#242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <[email protected]>

ShyamsundarR removed this from the Release 4.0 (STM) milestone Jan 16, 2018

mscherer pushed a commit that referenced this issue Jan 30, 2018

quiesce, gfproxy: Implement failover across multiple gfproxy nodes

d25b606

Updates: #242 Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69 Signed-off-by: Poornima G <[email protected]>

ShyamsundarR added this to the Release 4.1 (LTM) milestone Mar 20, 2018

ShyamsundarR assigned poornimag Mar 20, 2018

ShyamsundarR closed this as completed Jun 20, 2018

amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018

quiesce, gfproxy: Implement failover across multiple gfproxy nodes

192231e

Updates: gluster#242 Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69 Signed-off-by: Poornima G <[email protected]>

prashanthpai mentioned this issue Sep 24, 2018

High memory consumption depending on volume bricks count gluster/libgfapi-python#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thin client feature #242

thin client feature #242

amarts commented Jun 14, 2017

poornimag commented Jul 13, 2017

mchangir commented Jul 13, 2017

amarts commented Jul 13, 2017

pranithk commented Jul 13, 2017

poornimag commented Jul 27, 2017

poornimag commented Jul 27, 2017

atinmu commented Aug 8, 2017

poornimag commented Aug 8, 2017

nixpanic commented Aug 13, 2017 via email

raghavendrahg commented Sep 7, 2017 •

edited

Loading

mchangir commented Sep 8, 2017

poornimag commented Sep 20, 2017 •

edited

Loading

gluster-ant commented Oct 10, 2017

gluster-ant commented Oct 16, 2017

gluster-ant commented Oct 18, 2017

gluster-ant commented Dec 21, 2017

gluster-ant commented Feb 8, 2018

thin client feature #242

thin client feature #242

Comments

amarts commented Jun 14, 2017

codename: gfproxy

poornimag commented Jul 13, 2017

mchangir commented Jul 13, 2017

amarts commented Jul 13, 2017

pranithk commented Jul 13, 2017

poornimag commented Jul 27, 2017

poornimag commented Jul 27, 2017

atinmu commented Aug 8, 2017

poornimag commented Aug 8, 2017

nixpanic commented Aug 13, 2017 via email

raghavendrahg commented Sep 7, 2017 • edited Loading

mchangir commented Sep 8, 2017

poornimag commented Sep 20, 2017 • edited Loading

gluster-ant commented Oct 10, 2017

gluster-ant commented Oct 16, 2017

gluster-ant commented Oct 18, 2017

gluster-ant commented Dec 21, 2017

gluster-ant commented Feb 8, 2018

raghavendrahg commented Sep 7, 2017 •

edited

Loading

poornimag commented Sep 20, 2017 •

edited

Loading