Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thin client feature #242

Closed
amarts opened this issue Jun 14, 2017 · 17 comments
Closed

thin client feature #242

amarts opened this issue Jun 14, 2017 · 17 comments

Comments

@amarts
Copy link
Member

amarts commented Jun 14, 2017

codename: gfproxy

Move the clustering logic to serverside, and make the client thin.

Some work in glusterd volgen path to create this new volume file.

@pranithk @kshlm @pranithk

@poornimag
Copy link

Current patch by facebook is [1] . There remains some changes to be discussed and decided in the glusterd part of changes, here are few points:

  1. Currently, to start the gfroxy daemon has to be started manually from the command line(by specifying the gfproxy volfile), and so does the thin client. I think we need to start the gfproxy daemon automatically if gfproxy is set for the volume and --thin-client should be the mount option instead of providing the volfile.

  2. Should, enabling the gfproxy for a volume be a volume option? or is it a cluster wide option? Or it need not be an option but enabled by default? I think since its the way forward, enabling by default makes sense, but just to have a valve, have a cluster wide option. Thoughts?

  3. Currently one gfproxy-deamon can load only one volume, should there be one gfproxy daemon for per node/ per volume/ scale based on number of clients?

  4. Not sure of the behaviour on client graph change - its definitely not dynamically changed. Supporting the dynamic graph change requires code changes in protocol-server, instead automatic restart of the gfproxy daemon sounds like a better option, as there is AHA and probable failover in the future. Thoughts?

  5. Also make change in gfapi and FUSE to have reduced resource consumption(memory/threads) so that the client is thin from resource perspective as well.

@atinmu @pranithk @amarts @obnoxxx @kshlm your thoughts on the above questions?

Each of them can be worked as a patch on top of [1], unless the decision we arrive at, requires a major reversal of approaches in [1].

[1] https://review.gluster.org/#/c/16843

@mchangir
Copy link
Contributor

  1. Having a volume wide gfproxy as an option seems reasonable. Some volumes may be accessed by a large number of clients like for a common shared directory. Some volumes may be Virtual Machine backing store which could be accessed directly by the guest OS which could help mitigate latency otherwise introduced by the gfproxy mediator.

  2. volume multiplexing anyone ? on the lines of brick-multiplexing
    bundling up volume specific globals into a struct gfproxyvol and we'd be good to service multiple volumes by a single proxy process ... I think

@amarts
Copy link
Member Author

amarts commented Jul 13, 2017

To answer @poornimag's questions:

  1. Ack! Starting by default makes sense.

  2. Enabling by default would be good in upstream. But there should be an option at volume level to disable it to productize it.

  3. @mchangir's comments are valid, we need to evaluate multiplexing here.

  4. In first version, lets not bother about graph change IMO. lets restart the process (or re-attach) like brick in gfproxy process too.

  5. Call to be taken later. Not a blocker IMO. In general changes in that line will help libgfapi usage better for sure.

Gave comment on the patch at the top level. GlusterD changes looked good in the patch.

@pranithk
Copy link
Member

@amarts @mchangir Doing 3. with multi-plexing poses challenges for graph switch. If we go ahead with restart as per 4. then restarting the process which is serving all volumes will lead to full movement of clients from one machine to other which will be costly IMHO. So 3 & 4 go together for now.

I don't have any strong opinion on 1., 2. as long as there are options to opt-in/opt-out of thin-client.

@poornimag I am interested in 5. irrespective of when we will do them. This has impact in gluster-block as well. Do you have anything specific in mind? May be a separate github issue?

@poornimag
Copy link

1, 2 concludes. 3. we can evaluate, but no need to block on that.
4. Even if graph change is not implemented, atleast the part where the gfproxy daemon is restarted on client vol changes needs to be automated.
5. @pranithk Yes, i had done some initial work on this, for gfapi, i think its big enough and independent to be a different issue, will raise a new one.

@poornimag
Copy link

One more important thing to consider is, whether we want to restrict gfproxyd daemon to be run only on Trusted storage pool or should be allowed to run outside.

If we restrict gfproxyd to trusted storage pool, implementation is quite direct and simple. Glusterd can manage the daemon, start on enabling, restart on graph change, stop on disabling. Also can have multiple gfproxyd running for different volumes on the same node, and have portmapping framework of glusterd.

If we need to allow gfproxyd on non trusted storage pool, we need to put more thought on it. Glusterd cannot be used to manage the daemon w.r.t starting the daemon, and portmap (help end client identify the host and port of the gfproxyd to connect to). Any thoughts on this @atinmu @amarts @obnoxxx @nixpanic and others? One suggestion by @obnoxxx and @nixpanic was to have another meta daemon that runs outside trusted storage pool, and this daemon does the portmapping, starting, stopping and restarting of gfproxyd daemons. But this means a lot of code sharing/duplication between glusterd and meta daemon. This also means a lot more rpc connections for glusterd etc. Any thoughts?

@atinmu
Copy link

atinmu commented Aug 8, 2017

Can you explain the benefit of having this daemon run outside of TSP ?

@poornimag
Copy link

In environments like qemu, ( also block device/containers?) the clients run outside of trusted storage pool. If we run gfproxyd in trusted storage pool, the thin client each per VM will have to talk to gfproxyd and then another network hop to talk to bricks, which involves 2 network ops, as opposed to existing one network op(the thick client talks to bricks directly). Hence to not compromise a lot on performance and also have thin client, its good to have gfproxyd run on non trusted storage pools as well.

@nixpanic
Copy link
Member

nixpanic commented Aug 13, 2017 via email

@raghavendrahg
Copy link
Member

raghavendrahg commented Sep 7, 2017

Some more points after discussion with @poornimag :

  • How does client get the port on which gfproxy is running? Would there be a portmapper for gf-proxy? What is the input this portmapper takes?
  • Authentication - Should proxy authenticate clients or just act as a pass through and let bricks authenticate?
  • It looks like we need a master/gfproxy/server xlator which listens on a single port (If we are going multiplexing volume way) on behalf of all volumes gf-proxy is exporting.
    • What programs this xlator should register? Note that currently client uses handshake, ping and Glusterfs programs from a brick. What is the functionality that this xlator needs apart from the functionality provided by protocol/server.
    • I think inode and fdtables should be associated with each volume rather than having global data-structures. That way there is more concurrency.
    • There will be some work for holding back fops till the volume/graph ready to serve these fops.
  • Providing fail-over (maybe through AHA) might open up its own challenges of how do we implement non-idempotent ops like mkdir (solutions like DRC might help).
  • Cleanup of fds and locks on bricks when thin client dies won't work seamlessly.
  • I think we at least need to load md-cache on thin-client for better performance of ops like ls -l. This also means we need upcall notifications to reach thin clients. upcall is also needed to invalidate VFS cache of thin client kernel.
  • Write behind cant be loaded on gf-proxy. Imagine following scenario when write-behind is loaded on gf-proxy:
    • write w1 is cached in write-behind
    • gf-proxy reboots and w1 is lost
    • application does fsync. Now gfproxy don't have enough data whether to fail or succeed fsync.
  • Nameless lookups (both fresh and revalidate) in DHT are costly. With gf-proxy we can expect more number of nameless lookups than named lookups.
  • We need to test how seamlessly add-brick works with gf-proxy. Few years ago, gNFS (which has design similar to gf-proxy) had bugs where directory operations would fail post add brick. I think most of these bugs are fixed now, but better to verify once.

@mchangir
Copy link
Contributor

mchangir commented Sep 8, 2017

About running gfproxy outside the TSP:

  1. could we have a 'gluster proxypeer probe' command ?
    This could be used to form a trusted proxy pool which the clients can use to failover to.

  2. could we have a 'gluster proxypool register' command ?
    This could be used to securely register the proxy pool/cluster with the TSP

Vol file distribution could be carried out as follows:

  • When a system, outside either of the pools, tries to mount the volume via the proxy nodes or the TSP with the 'mount_via_proxy' flag, it could be handed the 'proxy_client' flavored vol file, otherwise it could be handed over the 'direct_client' vol file, only if the system is connecting directly to the TSP.
  • When a system, within the proxy pool, tries to mount the volume, it could be handed over two vol files, the 'proxy' flavor vol file and the 'proxy_client' flavor vol files.

Does this seem sensible ?

@amarts @poornimag @pranithk @atinmu @nixpanic @obnoxxx @pranithk

@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@gluster gluster deleted a comment from gluster-ant Sep 20, 2017
@poornimag
Copy link

poornimag commented Sep 20, 2017

Sorry for breaking the continuity of the previous discussions, summing certain things from the previous comments, here is the status of decided approaches and the work done so far:

Done so far:

  • 1- Glusterd manage gfproxyd, enable or disable gfproxyd based on volume set option.
    One gfproxyd per volume per node is automatically started on enabling gfproxy option on that volume.
  • 2 - AHA translator (By facebook) to handle IO continuity on restart of gfproxy daemon.
  • 3 - Failover to other gfproxy daemons on different nodes when required, without breaking the IO.
  • 4 - Restart of gfproxyd by glusterd, on client graph switch.

Design done and WIP:

  • 5 - Dynamic graph switch without restarting the gfproxyd (obsoletes 4).
  • 6 - Volume multiplexing in gfproxyd, allow gfpxoyd to be able to run multiple volume graphs on the same
    daemon. With this, we may be able to stick to one gfproxyd per node(based on performance analysis),
    which enable us to choose gfproxyd to be run on a standard port(lets say 24009) which inturn enable
    us to run gfproxyd outside trusted storage pool(but gfproxyd can still be managed by gluster on the
    trusted storage pool, on other nodes its admin job to start the daemon etc.).
  • 7 - Decrease the resource consumption of gfapi and FUSE when it is mounted as thin client.

To Be Done:

  • 8 - Reimplement, volgen in glusterd-2. Also implement interfaces for volume multiplexing in gfproxyd,
    status of gfproxyd from glusterd etc.
  • 9 - gfproxyd in active-active/active/passive setup?

Of these, we hope to finish 8, 7, 5 before feature freeze for Glusterd 4.0(Oct EOM?)

mscherer pushed a commit that referenced this issue Oct 10, 2017
Summmary:
Adds a new server-side daemon called gfproxyd & a new FUSE client
called gfproxy-client

Updates: #242
BUG: 1428063
Change-Id: I83210098d3a381922bc64fed1110ae1b76e6519f
Tested-by: Shreyas Siravara <[email protected]>
Reviewed-by: Kevin Vigor <[email protected]>
Signed-off-by: Shreyas Siravara <[email protected]>
Signed-off-by: Poornima G <[email protected]>
@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/18048 has been posted that references this issue.
Commit message: gfproxyd: Let glusterd manage gfproxy daemon

2 similar comments
@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/18048 has been posted that references this issue.
Commit message: gfproxyd: Let glusterd manage gfproxy daemon

@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/18048 has been posted that references this issue.
Commit message: gfproxyd: Let glusterd manage gfproxy daemon

mscherer pushed a commit that referenced this issue Oct 18, 2017
Updates: #242
BUG: 1428063
Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826
Signed-off-by: Poornima G <[email protected]>
amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Oct 31, 2017
Summmary:
Adds a new server-side daemon called gfproxyd & a new FUSE client
called gfproxy-client

Updates: gluster#242
BUG: 1428063
Change-Id: I83210098d3a381922bc64fed1110ae1b76e6519f
Tested-by: Shreyas Siravara <[email protected]>
Reviewed-by: Kevin Vigor <[email protected]>
Signed-off-by: Shreyas Siravara <[email protected]>
Signed-off-by: Poornima G <[email protected]>
amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Oct 31, 2017
Updates: gluster#242
BUG: 1428063
Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826
Signed-off-by: Poornima G <[email protected]>
@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/19022 has been posted that references this issue.
Commit message: quiesce: add fallocate and seek fops

mscherer pushed a commit that referenced this issue Dec 29, 2017
quiesce is useful in a gfproxy setup where if gfproxy machine goes
down the fop would be replayed. Hence only added the fops which is
supported by fuse layer to start with. With this patch, no behavior
change is added (ie, volfile change etc). Just making sure to have
the translator up-to-date so that if required we can consume it.

Updates #242

Change-Id: Id3bf204f2ccd42c3ac8f88d85836ecb855703e02
Signed-off-by: Amar Tumballi <[email protected]>
Signed-off-by: Poornima G <[email protected]>
@ShyamsundarR ShyamsundarR removed this from the Release 4.0 (STM) milestone Jan 16, 2018
mscherer pushed a commit that referenced this issue Jan 30, 2018
Updates: #242
Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69
Signed-off-by: Poornima G <[email protected]>
@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/19525 has been posted that references this issue.
Commit message: libgfapi: Add option for gfproxy mount

@ShyamsundarR ShyamsundarR added this to the Release 4.1 (LTM) milestone Mar 20, 2018
amarts added a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018
quiesce is useful in a gfproxy setup where if gfproxy machine goes
down the fop would be replayed. Hence only added the fops which is
supported by fuse layer to start with. With this patch, no behavior
change is added (ie, volfile change etc). Just making sure to have
the translator up-to-date so that if required we can consume it.

Updates gluster#242

Change-Id: Id3bf204f2ccd42c3ac8f88d85836ecb855703e02
Signed-off-by: Amar Tumballi <[email protected]>
Signed-off-by: Poornima G <[email protected]>
amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018
Updates: gluster#242
Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69
Signed-off-by: Poornima G <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants