[Bug] Ensure OpenSearch Dashboards stays available in large clusters #330

boktorbb · 2021-05-05T20:49:21Z

Problem Statement

For sufficiently large OpenSearch clusters, when Dashboards sends a health check out there can be a failure and Dashboards becomes unavailable.

Root Cause of the issue

The default Dashboards behavior is to fan out healthcheck requests across the entire cluster. For large clusters, if any nodes are processing or ingestion heavy and time out then it fails the healthcheck and Dashboards becomes unavailable.

Proposed Dashboards solution

The proposal is to create effectively a similar node attribute and use that for an optimized healthcheck

Dashboards Configuration: Create a setting that’s called optimized_healthcheck in opensearch_dashboards.yml that looks for the OpenSearch Node attribute cluster_id By default, optimized_healthcheck will default to null which lets Dashboards continue fanning out healthcheck requests across all nodes. If the value is cluster_id , then it will switch to the logic outlined in the below algorithm section.

OpenSearch Configuration: cluster_id → a new node attribute to be added in by customers that would differentiate cluster instances

cluster_id can take the form of an integer that is assigned during cluster creation to all OpenSearch nodes. It will increment when new instances of the cluster are spun up

Using this cluster_id we can follow a general algorithm:

Step 1: Aggregate all cluster_id for OpenSearch nodes
Step 2: if all the nodes have same cluster_id, retrieve nodes.info from _local node only.
- Using _cluster/state/nodes to retrieve from each master node
Step 3: if the nodes have different cluster_id, fan out the request to all the nodes.

Design Doc
Implementation

The text was updated successfully, but these errors were encountered:

Ensures that Dashboards checks only the local OpenSearch node when cluster_id node attribute is present and all nodes have some cluster_id value; Otherwise, it uses default behavior Closes opensearch-project#330 Signed-off-by: Bishoy Boktor <[email protected]>

* Implement optimized healthcheck for Dashboards Ensures that Dashboards checks only the local OpenSearch node when cluster_id node attribute is present and all nodes have some cluster_id value; Otherwise, it uses default behavior Closes #330 Signed-off-by: Bishoy Boktor <[email protected]> * Update optimizedHealthcheck setting to be configurable opensearch.optimizedHealthcheck is now {string|undefined} setting that corresponds to the user's node attribute created in OpenSearch. Healthcheck will now check the node attribute path ending in the value of the setting. Signed-off-by: Bishoy Boktor <[email protected]> * Simplify getNodeId logic and update documentation Simplifies getNodeId code. Also, updates healthcheck param to healthcheckAttributeName. Signed-off-by: Bishoy Boktor <[email protected]> * Update opensearch_dashboards.yml with setting example Signed-off-by: Bishoy Boktor <[email protected]> * Update healthcheck setting name to optimizedHealthcheckId Signed-off-by: Bishoy Boktor <[email protected]>

boktorbb added enhancement New feature or request migration Any plans, changes, or enhancements needed for migration labels May 5, 2021

boktorbb added this to the 1.x release milestone May 5, 2021

mihirsoni changed the title ~~[Patch] Ensure OpenSearch Dashboards stays available in large clusters~~ [Bug] Ensure OpenSearch Dashboards stays available in large clusters May 6, 2021

boktorbb mentioned this issue Jun 8, 2021

Implement optimized healthcheck for Dashboards #463

Merged

5 tasks

tmarkley added the v1.0.0 label Jun 9, 2021

boktorbb closed this as completed in #463 Jun 11, 2021

ruanyl mentioned this issue Sep 17, 2024

Use local clusterState call during healthchecks #8187

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Ensure OpenSearch Dashboards stays available in large clusters #330

[Bug] Ensure OpenSearch Dashboards stays available in large clusters #330

boktorbb commented May 5, 2021 •

edited

Loading

[Bug] Ensure OpenSearch Dashboards stays available in large clusters #330

[Bug] Ensure OpenSearch Dashboards stays available in large clusters #330

Comments

boktorbb commented May 5, 2021 • edited Loading

Problem Statement

Root Cause of the issue

Proposed Dashboards solution

boktorbb commented May 5, 2021 •

edited

Loading