glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail #1614

mohit84 · 2020-10-12T11:38:11Z

During glusterd handshake glusterd received a volume dictionary
from peer end to compare the own volume dictionary data.If the options
are differ it sets the key to recognize volume options are changed
and call import syntask to delete/start the volume.In brick_mux
environment while number of volumes are high(5k) the dict api in function
glusterd_compare_friend_volume takes time because the function
glusterd_handle_friend_req saves all peer volume data in a single dictionary.
Due to time taken by the function glusterd_handle_friend RPC requests receives
a call_bail from a peer end gluster(CLI) won't be able to show volume status.

Solution: To optimize the code done below changes

Populate a new specific dictionary to save the peer end version specific
data so that function won't take much time to take the decision about the
peer end has some volume updates.
In case of volume has differ version set the key in status_arr instead
of saving in a dictionary to make the operation is faster.

Note: To validate the changes followed below procedure

Setup 5100 distributed volumes 3x1
Enable brick_mux
Start all the volumes
Kill all gluster processes on 3rd node
Run a loop to update volume option on a 1st node
for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done
Start the glusterd process on the 3rd node
Wait to finish handshake and check there should not be any call_bail message
in the logs

Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1
Fixes: #1613
Signed-off-by: Mohit Agrawal [email protected]

mohit84 · 2020-10-12T11:41:22Z

/run regression

xhernandez

Is it possible that the problem is that dicts are not the right structure for what we need here ?

xhernandez · 2020-10-13T13:33:57Z

xlators/mgmt/glusterd/src/glusterd-handler.c

@@ -82,6 +82,164 @@ glusterd_big_locked_handler(rpcsvc_request_t *req, rpcsvc_actor actor_fn)
    return ret;
 }

+static int32_t
+glusterd_friend_dict_unserialize(char *orig_buf, int32_t size, dict_t **fill,
+                                 dict_t **peer_ver)


There already exists an unserialize function in dict.c. This is mostly a copy&paste of that function with minor changes. This creates duplicated code and exposes a lot of dict internals. Could you do this without exposing dict internals in a more generic way ?

xhernandez · 2020-10-13T13:35:30Z

libglusterfs/src/libglusterfs.sym

@@ -394,6 +394,7 @@ dict_key_count
 dict_keys_join
 dict_lookup
 dict_new
+get_new_data_from_pool


If we need to expose internal structures outside dict, it probably means that dict implementation is not good enough or we are not using the right structure for our needs.

Also, exposing internals will make it harder to provide better implementations of dicts in the future.

xhernandez · 2020-10-13T13:41:11Z

xlators/mgmt/glusterd/src/glusterd-handler.c

+           data so use a specific dictionar to improve the friend update
+           performance
+        */
+        if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) ||


Suggested change

if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) ||

if ((strstr(key, ".quota-cksum")) || (strstr(key, ".cksum")) ||

xhernandez · 2020-10-13T13:51:04Z

xlators/mgmt/glusterd/src/glusterd-utils.c

+*/
+#if defined(GF_ENABLE_BRICKMUX)
+    if (ret) {
+        ret = _gf_false;


Suggested change

ret = _gf_false;

ret = 0;

xhernandez · 2020-10-13T13:57:20Z

xlators/mgmt/glusterd/src/glusterd.h

-    u_int dictlen;
+    dict_t *peer_data;
+    dict_t *peer_ver_data;          // Dictionary to save peer version data
+    unsigned long status_arr[256];  // Array to save volume update status


If we want to use a fixed number of bits, uint64_t or similar would be better. An unsigned long is 64 bits long on 64-bit machines but 32 bits long on 32-bit machines. I think it using a type with explicit size is better, specially when we have hardcoded the number of bits per word in line 5054.

atinmu · 2020-10-13T14:44:00Z

How urgent is this? I can do a review sometime this weekend and provide my comments.

On Tue, 13 Oct 2020 at 19:38, Xavi Hernandez ***@***.***> wrote: ***@***.**** requested changes on this pull request. Is it possible that the problem is that dicts are not the right structure for what we need here ? ------------------------------ In xlators/mgmt/glusterd/src/glusterd-handler.c <#1614 (comment)>: > @@ -82,6 +82,164 @@ glusterd_big_locked_handler(rpcsvc_request_t *req, rpcsvc_actor actor_fn) return ret; } +static int32_t +glusterd_friend_dict_unserialize(char *orig_buf, int32_t size, dict_t **fill, + dict_t **peer_ver) There already exists an unserialize function in *dict.c*. This is mostly a copy&paste of that function with minor changes. This creates duplicated code and exposes a lot of dict internals. Could you do this without exposing dict internals in a more generic way ? ------------------------------ In libglusterfs/src/libglusterfs.sym <#1614 (comment)>: > @@ -394,6 +394,7 @@ dict_key_count dict_keys_join dict_lookup dict_new +get_new_data_from_pool If we need to expose internal structures outside dict, it probably means that dict implementation is not good enough or we are not using the right structure for our needs. Also, exposing internals will make it harder to provide better implementations of dicts in the future. ------------------------------ In xlators/mgmt/glusterd/src/glusterd-handler.c <#1614 (comment)>: > + value->is_static = _gf_false; + buf += vallen; + + ret = dict_addn(*fill, key, keylen, value); + /* Add specific keys for volumes like <volume[0-9]*>.quota-cksum,ckusm, + version quota-version, name to in peer_ver also, the peer_ver is a + specific dictionary to save these keys.The dictionary peer_ver would + be helpful to compare the volume options in the function + glusterd_compare_friend_volume to take decision about a volume has + any updates or not on the peer end.In case of brick_mux environment + if the function use peer_data dictionary that is having all volumes + key-data in a single dictionary the function takes time to access the + data so use a specific dictionar to improve the friend update + performance + */ + if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) || ⬇️ Suggested change - if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) || + if ((strstr(key, ".quota-cksum")) || (strstr(key, ".cksum")) || ------------------------------ In xlators/mgmt/glusterd/src/glusterd-utils.c <#1614 (comment)>: > - option is set and brick_mux key is not configured then consider - brick_mux option is enabled - */ - #if defined(GF_ENABLE_BRICKMUX) - if (ret) { - ret = _gf_false; - enabled = _gf_true; - } - #endif +/* GF_ENABLE_BRICKMUX set as a compile time build option, if the + option is set and brick_mux key is not configured then consider + brick_mux option is enabled +*/ +#if defined(GF_ENABLE_BRICKMUX) + if (ret) { + ret = _gf_false; ⬇️ Suggested change - ret = _gf_false; + ret = 0; ------------------------------ In xlators/mgmt/glusterd/src/glusterd.h <#1614 (comment)>: > @@ -246,8 +246,9 @@ typedef struct glusterd_add_dict_args { } glusterd_add_dict_args_t; typedef struct glusterd_friend_synctask_args { - char *dict_buf; - u_int dictlen; + dict_t *peer_data; + dict_t *peer_ver_data; // Dictionary to save peer version data + unsigned long status_arr[256]; // Array to save volume update status If we want to use a fixed number of bits, uint64_t or similar would be better. An unsigned long is 64 bits long on 64-bit machines but 32 bits long on 32-bit machines. I think it using a type with explicit size is better, specially when we have hardcoded the number of bits per word in line 5054. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#1614 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVHXGAQM6FUVY7S3P5IH4TSKRNMXANCNFSM4SMWB5LA> .

-- --Atin

mohit84 · 2020-10-13T14:45:40Z

How urgent is this? I can do a review sometime this weekend and provide my comments.
On Tue, 13 Oct 2020 at 19:38, Xavi Hernandez @.> wrote: @.* requested changes on this pull request. Is it possible that the problem is that dicts are not the right structure for what we need here ? ------------------------------ In xlators/mgmt/glusterd/src/glusterd-handler.c <#1614 (comment)>: > @@ -82,6 +82,164 @@ glusterd_big_locked_handler(rpcsvc_request_t *req, rpcsvc_actor actor_fn) return ret; } +static int32_t +glusterd_friend_dict_unserialize(char orig_buf, int32_t size, dict_t **fill, + dict_t **peer_ver) There already exists an unserialize function in dict.c. This is mostly a copy&paste of that function with minor changes. This creates duplicated code and exposes a lot of dict internals. Could you do this without exposing dict internals in a more generic way ? ------------------------------ In libglusterfs/src/libglusterfs.sym <#1614 (comment)>: > @@ -394,6 +394,7 @@ dict_key_count dict_keys_join dict_lookup dict_new +get_new_data_from_pool If we need to expose internal structures outside dict, it probably means that dict implementation is not good enough or we are not using the right structure for our needs. Also, exposing internals will make it harder to provide better implementations of dicts in the future. ------------------------------ In xlators/mgmt/glusterd/src/glusterd-handler.c <#1614 (comment)>: > + value->is_static = _gf_false; + buf += vallen; + + ret = dict_addn(fill, key, keylen, value); + / Add specific keys for volumes like <volume[0-9]>.quota-cksum,ckusm, + version quota-version, name to in peer_ver also, the peer_ver is a + specific dictionary to save these keys.The dictionary peer_ver would + be helpful to compare the volume options in the function + glusterd_compare_friend_volume to take decision about a volume has + any updates or not on the peer end.In case of brick_mux environment + if the function use peer_data dictionary that is having all volumes + key-data in a single dictionary the function takes time to access the + data so use a specific dictionar to improve the friend update + performance + / + if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) || Suggested change - if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) || + if ((strstr(key, ".quota-cksum")) || (strstr(key, ".cksum")) || ------------------------------ In xlators/mgmt/glusterd/src/glusterd-utils.c <#1614 (comment)>: > - option is set and brick_mux key is not configured then consider - brick_mux option is enabled - / - #if defined(GF_ENABLE_BRICKMUX) - if (ret) { - ret = _gf_false; - enabled = _gf_true; - } - #endif +/ GF_ENABLE_BRICKMUX set as a compile time build option, if the + option is set and brick_mux key is not configured then consider + brick_mux option is enabled +/ +#if defined(GF_ENABLE_BRICKMUX) + if (ret) { + ret = _gf_false; Suggested change - ret = _gf_false; + ret = 0; ------------------------------ In xlators/mgmt/glusterd/src/glusterd.h <#1614 (comment)>: > @@ -246,8 +246,9 @@ typedef struct glusterd_add_dict_args { } glusterd_add_dict_args_t; typedef struct glusterd_friend_synctask_args { - char *dict_buf; - u_int dictlen; + dict_t *peer_data; + dict_t *peer_ver_data; // Dictionary to save peer version data + unsigned long status_arr[256]; // Array to save volume update status If we want to use a fixed number of bits, uint64_t or similar would be better. An unsigned long is 64 bits long on 64-bit machines but 32 bits long on 32-bit machines. I think it using a type with explicit size is better, specially when we have hardcoded the number of bits per word in line 5054. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#1614 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVHXGAQM6FUVY7S3P5IH4TSKRNMXANCNFSM4SMWB5LA .
-- --Atin

You can take time.

mohit84 · 2020-10-14T05:24:52Z

/run regression

mohit84 · 2020-10-30T06:25:46Z

/run regression

mohit84 · 2020-10-30T06:25:55Z

/run brick-mux regression

mohit84 · 2020-10-30T11:38:38Z

/run regression

mohit84 · 2020-10-30T12:19:47Z

/run brick-mux regression

mohit84 · 2020-10-30T12:53:12Z

/run regression

mohit84 · 2020-10-31T04:18:47Z

/run full regression

mohit84 · 2020-11-02T16:26:25Z

/run regression

mohit84 · 2020-11-02T16:30:14Z

Is it possible that the problem is that dicts are not the right structure for what we need here ?

Yes it is happening because current dict is not a right structure to save huge key-value pair.

mykaul · 2020-11-02T16:33:09Z

Is it possible that the problem is that dicts are not the right structure for what we need here ?

Yes it is happening because current dict is not a right structure to save huge key-value pair.

Most likely, but I believe if we can keep it encoded in the native format (as is done when we serialize/deserialize to/from XDR!), we can make it more efficient.

mohit84 · 2020-11-03T11:26:04Z

Is it possible that the problem is that dicts are not the right structure for what we need here ?

Yes it is happening because current dict is not a right structure to save huge key-value pair.

Most likely, but I believe if we can keep it encoded in the native format (as is done when we serialize/deserialize to/from XDR!), we can make it more efficient.

Yes we can but i believe more code change is required.We don't use everywhere dict_to_xdr, in glusterd we use dict_allocate_and_(un)serialize to convert dict to buffer or buffer to dict. We can plan for later, for the time being we can use this approach.

schaffung

+1

schaffung · 2020-11-05T03:52:32Z

libglusterfs/src/dict.c

+    xlator_t *this = NULL;
+
+    this = THIS;
+    GF_ASSERT(this);


We can remove this assertion..as THIS isn't NULL.

I think this can be taken up separately rather than going for in this patch..

amarts

I am in agreement with @xhernandez here that dict may not be the correct data structure for this particular handshake at all.

Considering we are only looking for 5 keys in every volume, its 25k info if there are 5k volumes, maximum of 256kb -> 1MB of data, that taking more than 600seconds is not a great thing.

I am ok with this getting into codebase, as I am personally of the opinion glusterd itself needs to be re-engineered (not a new thing btw). Considering we are not spending lot of efforts on that, this is a quick fix in my opinion.

+1 (for this patch if it is Ok for glusterd maintainers merge it).

amarts · 2020-11-19T12:14:49Z

xlators/mgmt/glusterd/src/glusterd.h

-    u_int dictlen;
+    dict_t *peer_data;
+    dict_t *peer_ver_data;          // Dictionary to save peer version data
+    uint64_t status_arr[256];       // Array to save volume update status


So, this now makes the assumption 16k (64x256) volumes are maximum with a single cluster. While that is a lot compared to where we are, we need to specifically call it out.

So, this now makes the assumption 16k (64x256) volumes are maximum with a single cluster. While that is a lot compared to where we are, we need to specifically call it out.

The function glusterd_add_volume_to_dict we do save almost 40 key-value per volume and in case of 4k volumes the total keys will be around 1.6L.As we know in current dictionary we do save all the keys value pair as a linked list so at the time of comparing key-value to fetch the values it takes time.IMO even if we use gfx_dict we can't save time to access the keys.With gfx_dict we can save time to serialize/unserialize the dictionary that is not much.
There are two main changes i did to optimize it
1) To take a decision about peer volume has changed we need only 5 keys so i have
saved those 5 keys in a specific dictionary so for total 4k volumes we will have 20k keys.

If peer has some updates instead of updating status in some dictionary i have introduced
a new arr(status_arr) and based on the single bit access we can take decision a peer has
some update and need to import it.
After applied both changes i am able to access 5k volume updates during handshake easily otherwise it is not possible.

@amarts Can you please give your vote on this.

mykaul · 2020-11-19T12:39:16Z

@amarts I think the problem is that we have a reasonable structure (gfx_dict , gfx_dict_pair and gfx_value), yet we somehow insist on converting to/from it, instead of using that format directly. That would have two benefits:

No need for the conversion.
A better implementation - seems more memory and CPU efficient (no need for this serialization from/to strings for every type).

I think this could be implemented within dict.c/h and of course xdr_to_dict().

amarts · 2020-11-19T12:41:16Z

I think this could be implemented within dict.c/h and of course xdr_to_dict().

Agree, adding 'type' to dict was done for this specific reason. Happy if someone picks this up. Helpful even for lookup() calls.

mykaul · 2020-11-19T13:19:52Z

I think this could be implemented within dict.c/h and of course xdr_to_dict().

Agree, adding 'type' to dict was done for this specific reason. Happy if someone picks this up. Helpful even for lookup() calls.

Filed #1822 to track this idea.

atinmu

I understand the intent of the change but I am not clear on how are you able to determine the update flag from status_arr. need some clarification

atinmu · 2020-11-19T13:18:56Z

libglusterfs/src/dict.c

+    GF_ASSERT(this);
+
+    if (!buf) {
+        gf_msg_callingfn("dict", GF_LOG_WARNING, EINVAL, LG_MSG_INVALID_ARG,


any specific reason why are you hardcoding "dict" instead of this->name?

We don't call this->name in any other dict_api, i have followed similar code convention.

atinmu · 2020-11-19T13:31:06Z

xlators/mgmt/glusterd/src/glusterd-utils.c

-        ret = dict_set_int32n(peer_data, key, keylen, 0);
+        /*Set the status to ensure volume is updated on the peer
+         */
+        arg->status_arr[(count / 64)] ^= 1UL << (count % 64);


What's the significance of '64' here?

The array type is unit64 so we need to use 64 to access a specific bit on a specific integer.

mohit84 · 2020-11-24T04:56:30Z

/run brick-mux regression

xhernandez

I still think we are not using the right structure and procedure to handle this, but at least this is better than the previous approach.

xhernandez · 2020-11-24T11:08:36Z

libglusterfs/src/dict.c

+data_t *
+get_new_data_from_pool(glusterfs_ctx_t *ctx)
+{
+    data_t *data = mem_get(ctx->dict_data_pool);
+
+    if (!data)
+        return NULL;
+
+    GF_ATOMIC_INIT(data->refcount, 0);
+    data->is_static = _gf_false;
+
+    return data;
+}
+


This is exactly the same as get_new_data() but with an argument. To avoid code duplication, we should modify get_new_data() to simply call this function with the appropriate argument.

xhernandez · 2020-11-24T11:20:38Z

libglusterfs/src/dict.c

+                value = get_new_data_from_pool(this->ctx);
+                 if (!value) {
+                     ret = -1;
+                     goto out;
+                 }
+                 value->len = vallen;
+                 value->data = gf_memdup(buf, vallen);
+                 value->data_type = GF_DATA_TYPE_STR_OLD;
+                 value->is_static = _gf_false;


Instead of allocating and initializing a new data_t, why we don't simply add the same value to both dicts ? data_t is a ref counted object, so it shouldn't be a problem.

xhernandez · 2020-11-24T11:22:12Z

libglusterfs/src/dict.c

+        value->is_static = _gf_false;
+        buf += vallen;
+
+        ret = dict_addn(*fill, key, keylen, value);


ret needs to be checked here.

xhernandez · 2020-11-24T11:23:28Z

libglusterfs/src/dict.c

@@ -61,6 +61,20 @@ get_new_data()
    return data;
 }

+data_t *


Suggested change

data_t *

static data_t *

xhernandez · 2020-11-24T11:26:41Z

libglusterfs/src/dict.c

+
+        ret = dict_addn(*fill, key, keylen, value);
+        for (j = 0; specific_key_arr[j]; j++) {
+            if (strstr(key, specific_key_arr[j])) {


How we are sure that specific_key_arr values are specific enough so that we don't accidentally match other keys that could have the string in unwanted places ?

xhernandez · 2020-11-24T11:28:23Z

xlators/mgmt/glusterd/src/glusterd-handler.c

    dict = dict_new();
+    peer_ver = dict_new();


Both dicts should be checked for errors.

xhernandez · 2020-11-24T11:31:53Z

xlators/mgmt/glusterd/src/glusterd-utils.c

@@ -3142,7 +3142,7 @@ glusterd_add_volume_to_dict(glusterd_volinfo_t *volinfo, dict_t *dict,
    if (ret)
        goto out;

-    snprintf(key, sizeof(key), "%s.ckusm", pfx);
+    snprintf(key, sizeof(key), "%s.cksum", pfx);


On second thought, even though this name is incorrect, changing it wouldn't cause backward compatibility issues ?

xhernandez · 2020-11-24T11:35:41Z

xlators/mgmt/glusterd/src/glusterd-utils.c

    if (!peer_data) {
        gf_smsg(this->name, GF_LOG_ERROR, errno, GD_MSG_DICT_CREATE_FAIL, NULL);
        goto out;
    }


I think this is not needed. If arg->peer_data is not valid, we would have failed much before getting here.

xhernandez · 2020-11-24T11:37:44Z

xlators/mgmt/glusterd/src/glusterd-utils.c

@@ -5265,7 +5270,7 @@ glusterd_import_friend_volumes_synctask(void *opaque)
    conf->restart_bricks = _gf_true;

    while (i <= count) {


If count is very big, we should avoid checking for each single volume. We should take advantage of arg->status_arr and efficiently skip entries that are nor modified.

xhernandez · 2020-11-24T11:41:16Z

xlators/mgmt/glusterd/src/glusterd-utils.c

@@ -5491,8 +5495,12 @@ glusterd_compare_friend_data(dict_t *peer_data, int32_t *status, char *hostname)
        goto out;
    }

+    arg = GF_CALLOC(1, sizeof(*arg), gf_common_mt_char);


We could allocate the space for the bitmap dynamically based on count, instead of using a fixed size array. You also need to check allocation errors.

mohit84 · 2020-11-25T14:41:12Z

/run regression

mohit84 · 2020-11-25T14:44:42Z

/run regression

mohit84 · 2020-11-26T04:26:55Z

/run regression

mohit84 · 2020-11-26T06:00:35Z

/run regression

mohit84 · 2020-11-26T09:20:40Z

/run regression

mohit84 · 2020-11-26T09:20:48Z

/run full regression

mohit84 · 2020-11-26T09:20:56Z

/run brick-mux regression

xhernandez · 2020-11-26T10:17:24Z

libglusterfs/src/dict.c

@@ -30,6 +30,8 @@ struct dict_cmp {
    gf_boolean_t (*value_ignore)(char *k);
 };

+static glusterfs_ctx_t *global_ctx = NULL;


THIS->ctx is not specific to dicts. If you want to use it globally, you should put it in the right place. Probably globals.c would be a good place for it.

xhernandez · 2020-11-26T10:18:45Z

libglusterfs/src/dict.c

@@ -108,6 +108,7 @@ get_new_dict_full(int size_hint)
    dict->free_pair.key = NULL;
    dict->totkvlen = 0;
    LOCK_INIT(&dict->lock);
+    global_ctx = THIS->ctx;


Initialization should go to a glusterfs_ctx_t related function.

xhernandez · 2020-11-26T10:22:16Z

xlators/mgmt/glusterd/src/glusterd-utils.c

+            mask = bm &
+                   (-bm); /* mask will contain the lowest bit set from bm. */
+            bm ^= mask;
+            ret = glusterd_import_friend_volume(peer_data, 0 + ffsll(mask) - 1,


Suggested change

ret = glusterd_import_friend_volume(peer_data, 0 + ffsll(mask) - 1,

ret = glusterd_import_friend_volume(peer_data, i + ffsll(mask) - 1,

xhernandez · 2020-11-26T10:23:26Z

xlators/mgmt/glusterd/src/glusterd-utils.c

+        bm = arg->status_arr[i / 64];
+        while (bm != 0) {
+            mask = bm &
+                   (-bm); /* mask will contain the lowest bit set from bm. */


It would be better to put the comment above the line to avoid an ugly line split.

xhernandez · 2020-11-26T10:24:45Z

xlators/mgmt/glusterd/src/glusterd-utils.c

@@ -5229,12 +5229,13 @@ glusterd_import_friend_volumes_synctask(void *opaque)
 {
    int32_t ret = -1;
    int32_t count = 0;
-    int i = 1;
+    int i = 0; /* Always start from 0 to access correct bitmap */


I wouldn't change this. If you do that, a lot of other places that consider that '1' is the first volume will need to be adjusted.

xhernandez · 2020-11-26T10:26:36Z

xlators/mgmt/glusterd/src/glusterd.h

-    dict_t *peer_ver_data;  // Dictionary to save peer version data
-    uint64_t *status_arr;   // Array to save volume update status
+    dict_t *peer_ver_data;   // Dictionary to save peer version data
+    uint64_t status_arr[1];  // Array to save volume update status


I would explicitly say that the real size of the array is dynamically allocated based on the number of volumes.

mohit84 · 2020-11-26T12:56:36Z

/run regression

mohit84 · 2020-11-26T15:05:19Z

/run regression

mohit84 · 2020-11-26T15:05:34Z

/run full regression

mohit84 · 2020-11-26T15:05:56Z

/run brick-mux regression

mohit84 · 2020-11-27T05:03:01Z

/query regression

mohit84 · 2020-11-27T05:03:12Z

/query full regression

mohit84 · 2020-11-27T05:03:23Z

/query brick-mux regression

mohit84 · 2020-11-27T05:04:44Z

/run regression

mohit84 · 2020-11-27T05:04:52Z

/run full regression

mohit84 · 2020-11-27T05:05:00Z

/run brick-mux regression

During glusterd handshake glusterd received a volume dictionary from peer end to compare the own volume dictionary data.If the options are differ it sets the key to recognize volume options are changed and call import syntask to delete/start the volume.In brick_mux environment while number of volumes are high(5k) the dict api in function glusterd_compare_friend_volume takes time because the function glusterd_handle_friend_req saves all peer volume data in a single dictionary. Due to time taken by the function glusterd_handle_friend RPC requests receives a call_bail from a peer end gluster(CLI) won't be able to show volume status. Solution: To optimize the code done below changes 1) Populate a new specific dictionary to save the peer end version specific data so that function won't take much time to take the decision about the peer end has some volume updates. 2) In case of volume has differ version set the key in status_arr instead of saving in a dictionary to make the operation is faster. Note: To validate the changes followed below procedure 1) Setup 5100 distributed volumes 3x1 2) Enable brick_mux 3) Start all the volumes 4) Kill all gluster processes on 3rd node 5) Run a loop to update volume option on a 1st node for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done 6) Start the glusterd process on the 3rd node 7) Wait to finish handshake and check there should not be any call_bail message in the logs Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1 Fixes: #1613 Signed-off-by: Mohit Agrawal <[email protected]>

Resolve various review comments Fixes: #1613 Change-Id: I8b40e7899af4778d1d917958f535f4b50bb856d6 Signed-off-by: Mohit Agrawal <[email protected]>

Resolve the reviewer comments Fixes: #1613 Change-Id: Id129304705c052c4b8e106a94461c01ac4649417 Signed-off-by: Mohit Agrawal <[email protected]>

Resolve the reviewer comments Fixes: #1613 Signed-off-by: Mohit Agrawal <[email protected]> Change-Id: Id03299052c047247d4c8a07dd046e62e0b80c21a

mohit84 · 2020-11-27T11:54:30Z

/run regression

mohit84 · 2020-11-27T11:54:38Z

/run full regression

mohit84 · 2020-11-27T11:54:50Z

/run brick-mux regression

gluster#1614) During glusterd handshake glusterd received a volume dictionary from peer end to compare the own volume dictionary data.If the options are differ it sets the key to recognize volume options are changed and call import syntask to delete/start the volume.In brick_mux environment while number of volumes are high(5k) the dict api in function glusterd_compare_friend_volume takes time because the function glusterd_handle_friend_req saves all peer volume data in a single dictionary. Due to time taken by the function glusterd_handle_friend RPC requests receives a call_bail from a peer end gluster(CLI) won't be able to show volume status. Solution: To optimize the code done below changes 1) Populate a new specific dictionary to save the peer end version specific data so that function won't take much time to take the decision about the peer end has some volume updates. 2) In case of volume has differ version set the key in status_arr instead of saving in a dictionary to make the operation is faster. Note: To validate the changes followed below procedure 1) Setup 5100 distributed volumes 3x1 2) Enable brick_mux 3) Start all the volumes 4) Kill all gluster processes on 3rd node 5) Run a loop to update volume option on a 1st node for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done 6) Start the glusterd process on the 3rd node 7) Wait to finish handshake and check there should not be any call_bail message in the logs > Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1 > Fixes: gluster#1613 > Signed-off-by: Mohit Agrawal <[email protected]> > (Cherry pick from commit 12545d9) > (Reviewed on upstream link gluster#1613) Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1 BUG: 1898784 Signed-off-by: Mohit Agrawal <[email protected]> Reviewed-on: https://code.engineering.redhat.com/gerrit/221193 Tested-by: RHGS Build Bot <[email protected]> Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <[email protected]>

mohit84 requested review from schaffung, atinmu, Sheetalpamecha, xhernandez and amarts October 12, 2020 11:38

xhernandez requested changes Oct 13, 2020

View reviewed changes

gluster-ant mentioned this pull request Oct 23, 2020

[bug:1802947] list about 550 files in replicated volume will causes glfs_iotwr thread crash #978

Closed

schaffung previously approved these changes Nov 5, 2020

View reviewed changes

amarts reviewed Nov 19, 2020

View reviewed changes

atinmu reviewed Nov 19, 2020

View reviewed changes

xhernandez requested changes Nov 24, 2020

View reviewed changes

mohit84 dismissed schaffung’s stale review via 0cd10e6 November 25, 2020 06:13

xhernandez requested changes Nov 26, 2020

View reviewed changes

mohit84 added 4 commits November 27, 2020 17:23

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail

ec1b8f0

Resolve various review comments Fixes: #1613 Change-Id: I8b40e7899af4778d1d917958f535f4b50bb856d6 Signed-off-by: Mohit Agrawal <[email protected]>

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail

1ea9615

Resolve the reviewer comments Fixes: #1613 Change-Id: Id129304705c052c4b8e106a94461c01ac4649417 Signed-off-by: Mohit Agrawal <[email protected]>

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail

a5c921d

Resolve the reviewer comments Fixes: #1613 Signed-off-by: Mohit Agrawal <[email protected]> Change-Id: Id03299052c047247d4c8a07dd046e62e0b80c21a

xhernandez approved these changes Nov 30, 2020

View reviewed changes

xhernandez merged commit 12545d9 into gluster:devel Nov 30, 2020

xhernandez mentioned this pull request Dec 2, 2020

Reduce redirection: replace frame->root->ctx->measure_latency with frame->root->measure_latency #1874

Closed

mohit84 mentioned this pull request Jan 12, 2021

core: Replace THIS->ctx with global_ctx #1997

Closed

	if ((strstr(key, ".quota-cksum")) \|\| (strstr(key, ".ckusm")) \|\|
	if ((strstr(key, ".quota-cksum")) \|\| (strstr(key, ".cksum")) \|\|

		@@ -5265,7 +5270,7 @@ glusterd_import_friend_volumes_synctask(void *opaque)
		conf->restart_bricks = _gf_true;

		while (i <= count) {

	ret = glusterd_import_friend_volume(peer_data, 0 + ffsll(mask) - 1,
	ret = glusterd_import_friend_volume(peer_data, i + ffsll(mask) - 1,

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail #1614

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail #1614

Conversation

mohit84 commented Oct 12, 2020

mohit84 commented Oct 12, 2020

xhernandez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atinmu commented Oct 13, 2020 via email

mohit84 commented Oct 13, 2020

mohit84 commented Oct 14, 2020

mohit84 commented Oct 30, 2020

mohit84 commented Oct 30, 2020

mohit84 commented Oct 30, 2020

mohit84 commented Oct 30, 2020

mohit84 commented Oct 30, 2020

mohit84 commented Oct 31, 2020

mohit84 commented Nov 2, 2020

mohit84 commented Nov 2, 2020

mykaul commented Nov 2, 2020

mohit84 commented Nov 3, 2020

schaffung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amarts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mykaul commented Nov 19, 2020

amarts commented Nov 19, 2020

mykaul commented Nov 19, 2020

atinmu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mohit84 commented Nov 24, 2020

xhernandez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mohit84 commented Nov 25, 2020

mohit84 commented Nov 25, 2020

mohit84 commented Nov 26, 2020

mohit84 commented Nov 26, 2020

mohit84 commented Nov 26, 2020

mohit84 commented Nov 26, 2020

mohit84 commented Nov 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mohit84 commented Nov 26, 2020

mohit84 commented Nov 26, 2020

mohit84 commented Nov 26, 2020

mohit84 commented Nov 26, 2020

mohit84 commented Nov 27, 2020

mohit84 commented Nov 27, 2020

mohit84 commented Nov 27, 2020

mohit84 commented Nov 27, 2020

mohit84 commented Nov 27, 2020

mohit84 commented Nov 27, 2020

mohit84 commented Nov 27, 2020