Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Ensure OpenSearch Dashboards stays available in large clusters #330

Closed
1 of 2 tasks
boktorbb opened this issue May 5, 2021 · 0 comments · Fixed by #463
Closed
1 of 2 tasks

[Bug] Ensure OpenSearch Dashboards stays available in large clusters #330

boktorbb opened this issue May 5, 2021 · 0 comments · Fixed by #463
Labels
enhancement New feature or request migration Any plans, changes, or enhancements needed for migration v1.0.0
Milestone

Comments

@boktorbb
Copy link
Contributor

boktorbb commented May 5, 2021

Problem Statement

For sufficiently large OpenSearch clusters, when Dashboards sends a health check out there can be a failure and Dashboards becomes unavailable.

Root Cause of the issue

The default Dashboards behavior is to fan out healthcheck requests across the entire cluster. For large clusters, if any nodes are processing or ingestion heavy and time out then it fails the healthcheck and Dashboards becomes unavailable.

Proposed Dashboards solution

The proposal is to create effectively a similar node attribute and use that for an optimized healthcheck

Dashboards Configuration: Create a setting that’s called optimized_healthcheck in opensearch_dashboards.yml that looks for the OpenSearch Node attribute cluster_id By default, optimized_healthcheck will default to null which lets Dashboards continue fanning out healthcheck requests across all nodes. If the value is cluster_id , then it will switch to the logic outlined in the below algorithm section.

OpenSearch Configuration: cluster_id → a new node attribute to be added in by customers that would differentiate cluster instances

cluster_id can take the form of an integer that is assigned during cluster creation to all OpenSearch nodes. It will increment when new instances of the cluster are spun up

Using this cluster_id we can follow a general algorithm:

Step 1: Aggregate all cluster_id for OpenSearch nodes
Step 2: if all the nodes have same cluster_id, retrieve nodes.info from _local node only.
- Using _cluster/state/nodes to retrieve from each master node
Step 3: if the nodes have different cluster_id, fan out the request to all the nodes.

Screen Shot 2021-05-14 at 11 13 32 AM

  • Design Doc
  • Implementation
@boktorbb boktorbb added enhancement New feature or request migration Any plans, changes, or enhancements needed for migration labels May 5, 2021
@boktorbb boktorbb added this to the 1.x release milestone May 5, 2021
@mihirsoni mihirsoni changed the title [Patch] Ensure OpenSearch Dashboards stays available in large clusters [Bug] Ensure OpenSearch Dashboards stays available in large clusters May 6, 2021
boktorbb pushed a commit to boktorbb/OpenSearch-Dashboards that referenced this issue Jun 8, 2021
Ensures that Dashboards checks only the local OpenSearch node when
cluster_id node attribute is present and all nodes have some cluster_id
value; Otherwise, it uses default behavior

Closes opensearch-project#330

Signed-off-by: Bishoy Boktor <[email protected]>
@tmarkley tmarkley added the v1.0.0 label Jun 9, 2021
boktorbb added a commit that referenced this issue Jun 11, 2021
* Implement optimized healthcheck for Dashboards

Ensures that Dashboards checks only the local OpenSearch node when
cluster_id node attribute is present and all nodes have some cluster_id
value; Otherwise, it uses default behavior

Closes #330

Signed-off-by: Bishoy Boktor <[email protected]>

* Update optimizedHealthcheck setting to be configurable

opensearch.optimizedHealthcheck is now {string|undefined} setting that
corresponds to the user's node attribute created in OpenSearch.
Healthcheck will now check the node attribute path ending in the value
of the setting.

Signed-off-by: Bishoy Boktor <[email protected]>

* Simplify getNodeId logic and update documentation

Simplifies getNodeId code. Also, updates healthcheck param to
healthcheckAttributeName.

Signed-off-by: Bishoy Boktor <[email protected]>

* Update opensearch_dashboards.yml with setting example

Signed-off-by: Bishoy Boktor <[email protected]>

* Update healthcheck setting name to optimizedHealthcheckId

Signed-off-by: Bishoy Boktor <[email protected]>
boktorbb added a commit that referenced this issue Jun 11, 2021
* Implement optimized healthcheck for Dashboards

Ensures that Dashboards checks only the local OpenSearch node when
cluster_id node attribute is present and all nodes have some cluster_id
value; Otherwise, it uses default behavior

Closes #330

Signed-off-by: Bishoy Boktor <[email protected]>

* Update optimizedHealthcheck setting to be configurable

opensearch.optimizedHealthcheck is now {string|undefined} setting that
corresponds to the user's node attribute created in OpenSearch.
Healthcheck will now check the node attribute path ending in the value
of the setting.

Signed-off-by: Bishoy Boktor <[email protected]>

* Simplify getNodeId logic and update documentation

Simplifies getNodeId code. Also, updates healthcheck param to
healthcheckAttributeName.

Signed-off-by: Bishoy Boktor <[email protected]>

* Update opensearch_dashboards.yml with setting example

Signed-off-by: Bishoy Boktor <[email protected]>

* Update healthcheck setting name to optimizedHealthcheckId

Signed-off-by: Bishoy Boktor <[email protected]>
kavilla pushed a commit that referenced this issue Jun 21, 2021
* Implement optimized healthcheck for Dashboards

Ensures that Dashboards checks only the local OpenSearch node when
cluster_id node attribute is present and all nodes have some cluster_id
value; Otherwise, it uses default behavior

Closes #330

Signed-off-by: Bishoy Boktor <[email protected]>

* Update optimizedHealthcheck setting to be configurable

opensearch.optimizedHealthcheck is now {string|undefined} setting that
corresponds to the user's node attribute created in OpenSearch.
Healthcheck will now check the node attribute path ending in the value
of the setting.

Signed-off-by: Bishoy Boktor <[email protected]>

* Simplify getNodeId logic and update documentation

Simplifies getNodeId code. Also, updates healthcheck param to
healthcheckAttributeName.

Signed-off-by: Bishoy Boktor <[email protected]>

* Update opensearch_dashboards.yml with setting example

Signed-off-by: Bishoy Boktor <[email protected]>

* Update healthcheck setting name to optimizedHealthcheckId

Signed-off-by: Bishoy Boktor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request migration Any plans, changes, or enhancements needed for migration v1.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants