Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent cluster state and ensure weigh away exception check only for data nodes #6327

Merged
merged 5 commits into from
Feb 21, 2023

Conversation

anshu1106
Copy link
Contributor

@anshu1106 anshu1106 commented Feb 15, 2023

Description

This PR fixes inconsistent cluster state with ensure_weighed_in param while make cluster health api call.
Also adds support for weighed away health check only for data nodes.

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…is done only on data nodes

Signed-off-by: Anshu Agarwal <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testCancellation
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

listener.onFailure(new NodeWeighedAwayException("local node is weighed away"));
return;
DiscoveryNode localNode = currentState.getNodes().getLocalNode();
if (localNode.isDataNode()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be made more generic based on roles?

Copy link
Contributor Author

@anshu1106 anshu1106 Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do that but can we say that all data nodes have DATA_ROLE? And warm nodes don't have DATA_ROLE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a todo to take this later

Signed-off-by: Anshu Agarwal <[email protected]>
Signed-off-by: Anshu Agarwal <[email protected]>
@anshu1106 anshu1106 force-pushed the cluster-health-state-fix branch from dc1981d to 3e6d718 Compare February 16, 2023 14:02
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Feb 16, 2023

Codecov Report

Merging #6327 (68d8a0c) into main (6bb9e3e) will increase coverage by 0.06%.
The diff coverage is 22.22%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6327      +/-   ##
============================================
+ Coverage     70.68%   70.75%   +0.06%     
- Complexity    58994    59018      +24     
============================================
  Files          4800     4800              
  Lines        282427   282430       +3     
  Branches      40717    40719       +2     
============================================
+ Hits         199641   199822     +181     
+ Misses        66378    66171     -207     
- Partials      16408    16437      +29     
Impacted Files Coverage Δ
...n/cluster/health/TransportClusterHealthAction.java 44.00% <0.00%> (-0.23%) ⬇️
...ensearch/cluster/routing/WeightedRoutingUtils.java 76.47% <100.00%> (+3.13%) ⬆️
...luster/routing/allocation/RoutingExplanations.java 41.37% <0.00%> (-58.63%) ⬇️
.../admin/cluster/reroute/ClusterRerouteResponse.java 55.00% <0.00%> (-45.00%) ⬇️
...cluster/routing/allocation/RerouteExplanation.java 65.00% <0.00%> (-35.00%) ⬇️
...ensearch/client/indices/DetailAnalyzeResponse.java 20.54% <0.00%> (-34.25%) ⬇️
...nsearch/index/shard/IndexShardClosedException.java 66.66% <0.00%> (-33.34%) ⬇️
...arch/search/aggregations/pipeline/SimpleModel.java 38.46% <0.00%> (-30.77%) ⬇️
...ter/coordination/CoordinationStateTestCluster.java 73.78% <0.00%> (-20.74%) ⬇️
...n/java/org/opensearch/test/rest/yaml/Features.java 60.00% <0.00%> (-20.00%) ⬇️
... and 475 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

CHANGELOG.md Outdated
@@ -122,6 +122,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- [Segment Replication] Fix for peer recovery ([#5344](https://github.com/opensearch-project/OpenSearch/pull/5344))
- Fix weighted shard routing state across search requests([#6004](https://github.com/opensearch-project/OpenSearch/pull/6004))
- [Segment Replication] Fix bug where inaccurate sequence numbers are sent during replication ([#6122](https://github.com/opensearch-project/OpenSearch/pull/6122))
- [Weighted Routing] Fix inconsistent cluster state and ensure weigh away exception check only for data nodes ([#6327](https://github.com/opensearch-project/OpenSearch/pull/6327))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets skip this as it is not a user facing change.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Anshu Agarwal <[email protected]>
@anshu1106 anshu1106 force-pushed the cluster-health-state-fix branch from f0bc61d to 68d8a0c Compare February 21, 2023 07:17
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar Bukhtawar merged commit 95142c6 into opensearch-project:main Feb 21, 2023
@Bukhtawar Bukhtawar added the backport 2.x Backport to 2.x branch label Feb 21, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 21, 2023
…only for data nodes (#6327)

* Fix inconsistent cluster state and ensure weigh away exception check is done only on data nodes

Signed-off-by: Anshu Agarwal <[email protected]>
(cherry picked from commit 95142c6)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna pushed a commit that referenced this pull request Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants