Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail #1614

Merged
merged 4 commits into from
Nov 30, 2020
Merged

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail #1614

merged 4 commits into from
Nov 30, 2020

Conversation

mohit84
Copy link
Contributor

@mohit84 mohit84 commented Oct 12, 2020

During glusterd handshake glusterd received a volume dictionary
from peer end to compare the own volume dictionary data.If the options
are differ it sets the key to recognize volume options are changed
and call import syntask to delete/start the volume.In brick_mux
environment while number of volumes are high(5k) the dict api in function
glusterd_compare_friend_volume takes time because the function
glusterd_handle_friend_req saves all peer volume data in a single dictionary.
Due to time taken by the function glusterd_handle_friend RPC requests receives
a call_bail from a peer end gluster(CLI) won't be able to show volume status.

Solution: To optimize the code done below changes

  1. Populate a new specific dictionary to save the peer end version specific
    data so that function won't take much time to take the decision about the
    peer end has some volume updates.
  2. In case of volume has differ version set the key in status_arr instead
    of saving in a dictionary to make the operation is faster.

Note: To validate the changes followed below procedure

  1. Setup 5100 distributed volumes 3x1
  2. Enable brick_mux
  3. Start all the volumes
  4. Kill all gluster processes on 3rd node
  5. Run a loop to update volume option on a 1st node
    for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done
  6. Start the glusterd process on the 3rd node
  7. Wait to finish handshake and check there should not be any call_bail message
    in the logs

Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1
Fixes: #1613
Signed-off-by: Mohit Agrawal [email protected]

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 12, 2020

/run regression

Copy link
Contributor

@xhernandez xhernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the problem is that dicts are not the right structure for what we need here ?

@@ -82,6 +82,164 @@ glusterd_big_locked_handler(rpcsvc_request_t *req, rpcsvc_actor actor_fn)
return ret;
}

static int32_t
glusterd_friend_dict_unserialize(char *orig_buf, int32_t size, dict_t **fill,
dict_t **peer_ver)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already exists an unserialize function in dict.c. This is mostly a copy&paste of that function with minor changes. This creates duplicated code and exposes a lot of dict internals. Could you do this without exposing dict internals in a more generic way ?

@@ -394,6 +394,7 @@ dict_key_count
dict_keys_join
dict_lookup
dict_new
get_new_data_from_pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to expose internal structures outside dict, it probably means that dict implementation is not good enough or we are not using the right structure for our needs.

Also, exposing internals will make it harder to provide better implementations of dicts in the future.

data so use a specific dictionar to improve the friend update
performance
*/
if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) ||
if ((strstr(key, ".quota-cksum")) || (strstr(key, ".cksum")) ||

*/
#if defined(GF_ENABLE_BRICKMUX)
if (ret) {
ret = _gf_false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ret = _gf_false;
ret = 0;

u_int dictlen;
dict_t *peer_data;
dict_t *peer_ver_data; // Dictionary to save peer version data
unsigned long status_arr[256]; // Array to save volume update status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use a fixed number of bits, uint64_t or similar would be better. An unsigned long is 64 bits long on 64-bit machines but 32 bits long on 32-bit machines. I think it using a type with explicit size is better, specially when we have hardcoded the number of bits per word in line 5054.

@atinmu
Copy link

atinmu commented Oct 13, 2020 via email

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 13, 2020

How urgent is this? I can do a review sometime this weekend and provide my comments.
On Tue, 13 Oct 2020 at 19:38, Xavi Hernandez @.> wrote: @.* requested changes on this pull request. Is it possible that the problem is that dicts are not the right structure for what we need here ? ------------------------------ In xlators/mgmt/glusterd/src/glusterd-handler.c <#1614 (comment)>: > @@ -82,6 +82,164 @@ glusterd_big_locked_handler(rpcsvc_request_t *req, rpcsvc_actor actor_fn) return ret; } +static int32_t +glusterd_friend_dict_unserialize(char orig_buf, int32_t size, dict_t **fill, + dict_t **peer_ver) There already exists an unserialize function in dict.c. This is mostly a copy&paste of that function with minor changes. This creates duplicated code and exposes a lot of dict internals. Could you do this without exposing dict internals in a more generic way ? ------------------------------ In libglusterfs/src/libglusterfs.sym <#1614 (comment)>: > @@ -394,6 +394,7 @@ dict_key_count dict_keys_join dict_lookup dict_new +get_new_data_from_pool If we need to expose internal structures outside dict, it probably means that dict implementation is not good enough or we are not using the right structure for our needs. Also, exposing internals will make it harder to provide better implementations of dicts in the future. ------------------------------ In xlators/mgmt/glusterd/src/glusterd-handler.c <#1614 (comment)>: > + value->is_static = _gf_false; + buf += vallen; + + ret = dict_addn(fill, key, keylen, value); + / Add specific keys for volumes like <volume[0-9]>.quota-cksum,ckusm, + version quota-version, name to in peer_ver also, the peer_ver is a + specific dictionary to save these keys.The dictionary peer_ver would + be helpful to compare the volume options in the function + glusterd_compare_friend_volume to take decision about a volume has + any updates or not on the peer end.In case of brick_mux environment + if the function use peer_data dictionary that is having all volumes + key-data in a single dictionary the function takes time to access the + data so use a specific dictionar to improve the friend update + performance + / + if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) || Suggested change - if ((strstr(key, ".quota-cksum")) || (strstr(key, ".ckusm")) || + if ((strstr(key, ".quota-cksum")) || (strstr(key, ".cksum")) || ------------------------------ In xlators/mgmt/glusterd/src/glusterd-utils.c <#1614 (comment)>: > - option is set and brick_mux key is not configured then consider - brick_mux option is enabled - / - #if defined(GF_ENABLE_BRICKMUX) - if (ret) { - ret = _gf_false; - enabled = _gf_true; - } - #endif +/ GF_ENABLE_BRICKMUX set as a compile time build option, if the + option is set and brick_mux key is not configured then consider + brick_mux option is enabled +/ +#if defined(GF_ENABLE_BRICKMUX) + if (ret) { + ret = _gf_false; Suggested change - ret = _gf_false; + ret = 0; ------------------------------ In xlators/mgmt/glusterd/src/glusterd.h <#1614 (comment)>: > @@ -246,8 +246,9 @@ typedef struct glusterd_add_dict_args { } glusterd_add_dict_args_t; typedef struct glusterd_friend_synctask_args { - char *dict_buf; - u_int dictlen; + dict_t *peer_data; + dict_t *peer_ver_data; // Dictionary to save peer version data + unsigned long status_arr[256]; // Array to save volume update status If we want to use a fixed number of bits, uint64_t or similar would be better. An unsigned long is 64 bits long on 64-bit machines but 32 bits long on 32-bit machines. I think it using a type with explicit size is better, specially when we have hardcoded the number of bits per word in line 5054. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#1614 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVHXGAQM6FUVY7S3P5IH4TSKRNMXANCNFSM4SMWB5LA .
-- --Atin

You can take time.

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 14, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 30, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 30, 2020

/run brick-mux regression

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 30, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 30, 2020

/run brick-mux regression

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 30, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Oct 31, 2020

/run full regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 2, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 2, 2020

Is it possible that the problem is that dicts are not the right structure for what we need here ?

Yes it is happening because current dict is not a right structure to save huge key-value pair.

@mykaul
Copy link
Contributor

mykaul commented Nov 2, 2020

Is it possible that the problem is that dicts are not the right structure for what we need here ?

Yes it is happening because current dict is not a right structure to save huge key-value pair.

Most likely, but I believe if we can keep it encoded in the native format (as is done when we serialize/deserialize to/from XDR!), we can make it more efficient.

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 3, 2020

Is it possible that the problem is that dicts are not the right structure for what we need here ?

Yes it is happening because current dict is not a right structure to save huge key-value pair.

Most likely, but I believe if we can keep it encoded in the native format (as is done when we serialize/deserialize to/from XDR!), we can make it more efficient.

Yes we can but i believe more code change is required.We don't use everywhere dict_to_xdr, in glusterd we use dict_allocate_and_(un)serialize to convert dict to buffer or buffer to dict. We can plan for later, for the time being we can use this approach.

schaffung
schaffung previously approved these changes Nov 5, 2020
Copy link
Member

@schaffung schaffung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

xlator_t *this = NULL;

this = THIS;
GF_ASSERT(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this assertion..as THIS isn't NULL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be taken up separately rather than going for in this patch..

Copy link
Member

@amarts amarts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in agreement with @xhernandez here that dict may not be the correct data structure for this particular handshake at all.

Considering we are only looking for 5 keys in every volume, its 25k info if there are 5k volumes, maximum of 256kb -> 1MB of data, that taking more than 600seconds is not a great thing.

I am ok with this getting into codebase, as I am personally of the opinion glusterd itself needs to be re-engineered (not a new thing btw). Considering we are not spending lot of efforts on that, this is a quick fix in my opinion.

+1 (for this patch if it is Ok for glusterd maintainers merge it).

u_int dictlen;
dict_t *peer_data;
dict_t *peer_ver_data; // Dictionary to save peer version data
uint64_t status_arr[256]; // Array to save volume update status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this now makes the assumption 16k (64x256) volumes are maximum with a single cluster. While that is a lot compared to where we are, we need to specifically call it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this now makes the assumption 16k (64x256) volumes are maximum with a single cluster. While that is a lot compared to where we are, we need to specifically call it out.

The function glusterd_add_volume_to_dict we do save almost 40 key-value per volume and in case of 4k volumes the total keys will be around 1.6L.As we know in current dictionary we do save all the keys value pair as a linked list so at the time of comparing key-value to fetch the values it takes time.IMO even if we use gfx_dict we can't save time to access the keys.With gfx_dict we can save time to serialize/unserialize the dictionary that is not much.
There are two main changes i did to optimize it
1) To take a decision about peer volume has changed we need only 5 keys so i have
saved those 5 keys in a specific dictionary so for total 4k volumes we will have 20k keys.

  1. If peer has some updates instead of updating status in some dictionary i have introduced
    a new arr(status_arr) and based on the single bit access we can take decision a peer has
    some update and need to import it.
    After applied both changes i am able to access 5k volume updates during handshake easily otherwise it is not possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amarts Can you please give your vote on this.

@mykaul
Copy link
Contributor

mykaul commented Nov 19, 2020

@amarts I think the problem is that we have a reasonable structure (gfx_dict , gfx_dict_pair and gfx_value), yet we somehow insist on converting to/from it, instead of using that format directly. That would have two benefits:

  1. No need for the conversion.
  2. A better implementation - seems more memory and CPU efficient (no need for this serialization from/to strings for every type).

I think this could be implemented within dict.c/h and of course xdr_to_dict().

@amarts
Copy link
Member

amarts commented Nov 19, 2020

I think this could be implemented within dict.c/h and of course xdr_to_dict().

Agree, adding 'type' to dict was done for this specific reason. Happy if someone picks this up. Helpful even for lookup() calls.

@mykaul
Copy link
Contributor

mykaul commented Nov 19, 2020

I think this could be implemented within dict.c/h and of course xdr_to_dict().

Agree, adding 'type' to dict was done for this specific reason. Happy if someone picks this up. Helpful even for lookup() calls.

Filed #1822 to track this idea.

Copy link

@atinmu atinmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intent of the change but I am not clear on how are you able to determine the update flag from status_arr. need some clarification

GF_ASSERT(this);

if (!buf) {
gf_msg_callingfn("dict", GF_LOG_WARNING, EINVAL, LG_MSG_INVALID_ARG,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason why are you hardcoding "dict" instead of this->name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't call this->name in any other dict_api, i have followed similar code convention.

ret = dict_set_int32n(peer_data, key, keylen, 0);
/*Set the status to ensure volume is updated on the peer
*/
arg->status_arr[(count / 64)] ^= 1UL << (count % 64);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significance of '64' here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array type is unit64 so we need to use 64 to access a specific bit on a specific integer.

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 24, 2020

/run brick-mux regression

Copy link
Contributor

@xhernandez xhernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we are not using the right structure and procedure to handle this, but at least this is better than the previous approach.

Comment on lines 64 to 69
data_t *
get_new_data_from_pool(glusterfs_ctx_t *ctx)
{
data_t *data = mem_get(ctx->dict_data_pool);

if (!data)
return NULL;

GF_ATOMIC_INIT(data->refcount, 0);
data->is_static = _gf_false;

return data;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same as get_new_data() but with an argument. To avoid code duplication, we should modify get_new_data() to simply call this function with the appropriate argument.

Comment on lines 3640 to 3648
value = get_new_data_from_pool(this->ctx);
if (!value) {
ret = -1;
goto out;
}
value->len = vallen;
value->data = gf_memdup(buf, vallen);
value->data_type = GF_DATA_TYPE_STR_OLD;
value->is_static = _gf_false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of allocating and initializing a new data_t, why we don't simply add the same value to both dicts ? data_t is a ref counted object, so it shouldn't be a problem.

value->is_static = _gf_false;
buf += vallen;

ret = dict_addn(*fill, key, keylen, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret needs to be checked here.

@@ -61,6 +61,20 @@ get_new_data()
return data;
}

data_t *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data_t *
static data_t *


ret = dict_addn(*fill, key, keylen, value);
for (j = 0; specific_key_arr[j]; j++) {
if (strstr(key, specific_key_arr[j])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we are sure that specific_key_arr values are specific enough so that we don't accidentally match other keys that could have the string in unwanted places ?

Comment on lines 106 to +108
dict = dict_new();
peer_ver = dict_new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both dicts should be checked for errors.

@@ -3142,7 +3142,7 @@ glusterd_add_volume_to_dict(glusterd_volinfo_t *volinfo, dict_t *dict,
if (ret)
goto out;

snprintf(key, sizeof(key), "%s.ckusm", pfx);
snprintf(key, sizeof(key), "%s.cksum", pfx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, even though this name is incorrect, changing it wouldn't cause backward compatibility issues ?

Comment on lines 5249 to 5252
if (!peer_data) {
gf_smsg(this->name, GF_LOG_ERROR, errno, GD_MSG_DICT_CREATE_FAIL, NULL);
goto out;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not needed. If arg->peer_data is not valid, we would have failed much before getting here.

@@ -5265,7 +5270,7 @@ glusterd_import_friend_volumes_synctask(void *opaque)
conf->restart_bricks = _gf_true;

while (i <= count) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If count is very big, we should avoid checking for each single volume. We should take advantage of arg->status_arr and efficiently skip entries that are nor modified.

@@ -5491,8 +5495,12 @@ glusterd_compare_friend_data(dict_t *peer_data, int32_t *status, char *hostname)
goto out;
}

arg = GF_CALLOC(1, sizeof(*arg), gf_common_mt_char);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could allocate the space for the bitmap dynamically based on count, instead of using a fixed size array. You also need to check allocation errors.

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 25, 2020

/run regression

4 similar comments
@mohit84
Copy link
Contributor Author

mohit84 commented Nov 25, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run full regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run brick-mux regression

@@ -30,6 +30,8 @@ struct dict_cmp {
gf_boolean_t (*value_ignore)(char *k);
};

static glusterfs_ctx_t *global_ctx = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THIS->ctx is not specific to dicts. If you want to use it globally, you should put it in the right place. Probably globals.c would be a good place for it.

@@ -108,6 +108,7 @@ get_new_dict_full(int size_hint)
dict->free_pair.key = NULL;
dict->totkvlen = 0;
LOCK_INIT(&dict->lock);
global_ctx = THIS->ctx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialization should go to a glusterfs_ctx_t related function.

mask = bm &
(-bm); /* mask will contain the lowest bit set from bm. */
bm ^= mask;
ret = glusterd_import_friend_volume(peer_data, 0 + ffsll(mask) - 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ret = glusterd_import_friend_volume(peer_data, 0 + ffsll(mask) - 1,
ret = glusterd_import_friend_volume(peer_data, i + ffsll(mask) - 1,

bm = arg->status_arr[i / 64];
while (bm != 0) {
mask = bm &
(-bm); /* mask will contain the lowest bit set from bm. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to put the comment above the line to avoid an ugly line split.

@@ -5229,12 +5229,13 @@ glusterd_import_friend_volumes_synctask(void *opaque)
{
int32_t ret = -1;
int32_t count = 0;
int i = 1;
int i = 0; /* Always start from 0 to access correct bitmap */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't change this. If you do that, a lot of other places that consider that '1' is the first volume will need to be adjusted.

dict_t *peer_ver_data; // Dictionary to save peer version data
uint64_t *status_arr; // Array to save volume update status
dict_t *peer_ver_data; // Dictionary to save peer version data
uint64_t status_arr[1]; // Array to save volume update status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explicitly say that the real size of the array is dynamically allocated based on the number of volumes.

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run regression

1 similar comment
@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run full regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 26, 2020

/run brick-mux regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/query regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/query full regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/query brick-mux regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/run full regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/run brick-mux regression

During glusterd handshake glusterd received a volume dictionary
from peer end to compare the own volume dictionary data.If the options
are differ it sets the key to recognize volume options are changed
and call import syntask to delete/start the volume.In brick_mux
environment while number of volumes are high(5k) the dict api in function
glusterd_compare_friend_volume takes time because the function
glusterd_handle_friend_req saves all peer volume data in a single dictionary.
Due to time taken by the function glusterd_handle_friend RPC requests receives
a call_bail from a peer end gluster(CLI) won't be able to show volume status.

Solution: To optimize the code done below changes
1) Populate a new specific dictionary to save the peer end version specific
   data so that function won't take much time to take the decision about the
   peer end has some volume updates.
2) In case of volume has differ version set the key in status_arr instead
   of saving in a dictionary to make the operation is faster.

Note: To validate the changes followed below procedure
1) Setup 5100 distributed volumes 3x1
2) Enable brick_mux
3) Start all the volumes
4) Kill all gluster processes on 3rd node
5) Run a loop to update volume option on a 1st node
   for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done
6) Start the glusterd process on the 3rd node
7) Wait to finish handshake and check there should not be any call_bail message
   in the logs

Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1
Fixes: #1613
Signed-off-by: Mohit Agrawal <[email protected]>
Resolve various review comments

Fixes: #1613
Change-Id: I8b40e7899af4778d1d917958f535f4b50bb856d6
Signed-off-by: Mohit Agrawal <[email protected]>
Resolve the reviewer comments

Fixes: #1613
Change-Id: Id129304705c052c4b8e106a94461c01ac4649417
Signed-off-by: Mohit Agrawal <[email protected]>
Resolve the reviewer comments
Fixes: #1613
Signed-off-by: Mohit Agrawal <[email protected]>

Change-Id: Id03299052c047247d4c8a07dd046e62e0b80c21a
@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/run full regression

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 27, 2020

/run brick-mux regression

@xhernandez xhernandez merged commit 12545d9 into gluster:devel Nov 30, 2020
csabahenk pushed a commit to csabahenk/glusterfs that referenced this pull request Mar 7, 2023
gluster#1614)

During glusterd handshake glusterd received a volume dictionary
from peer end to compare the own volume dictionary data.If the options
are differ it sets the key to recognize volume options are changed
and call import syntask to delete/start the volume.In brick_mux
environment while number of volumes are high(5k) the dict api in function
glusterd_compare_friend_volume takes time because the function
glusterd_handle_friend_req saves all peer volume data in a single dictionary.
Due to time taken by the function glusterd_handle_friend RPC requests receives
a call_bail from a peer end gluster(CLI) won't be able to show volume status.

Solution: To optimize the code done below changes
1) Populate a new specific dictionary to save the peer end version specific
   data so that function won't take much time to take the decision about the
   peer end has some volume updates.
2) In case of volume has differ version set the key in status_arr instead
   of saving in a dictionary to make the operation is faster.

Note: To validate the changes followed below procedure
1) Setup 5100 distributed volumes 3x1
2) Enable brick_mux
3) Start all the volumes
4) Kill all gluster processes on 3rd node
5) Run a loop to update volume option on a 1st node
   for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done
6) Start the glusterd process on the 3rd node
7) Wait to finish handshake and check there should not be any call_bail message
   in the logs

> Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1
> Fixes: gluster#1613
> Signed-off-by: Mohit Agrawal <[email protected]>
> (Cherry pick from commit 12545d9)
> (Reviewed on upstream link gluster#1613)

Change-Id: Ibad7c23988539cc369ecc39dea2ea6985470bee1
BUG: 1898784
Signed-off-by: Mohit Agrawal <[email protected]>
Reviewed-on: https://code.engineering.redhat.com/gerrit/221193
Tested-by: RHGS Build Bot <[email protected]>
Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

glusterd[brick_mux]: Optimize friend handshake code to avoid call_bail
6 participants