Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip overriding routing table when it already contains entries with remote recovery source #9962

Merged

Conversation

linuxpi
Copy link
Collaborator

@linuxpi linuxpi commented Sep 11, 2023

Description

  • After Cluster Manager recovery or a fresh cluster bootstrap, we trigger the GatewayService.RecoverStateUpdateTask which performs various mutations to the cluster state.
  • One of the mutations blindly overrides the routing table in the cluster state. This makes sense as after recovery or new cluster bootstrap, routing table should be generated from scratch.
  • But for Remote Store and Remote Cluster State enabled clusters, we auto restore previous valid cluster state from Remote(Only IndexMetadata in 2.10). So routing tables are generated as part of restore flow with remote store recovery source.
  • To avoid GatewayService.RecoverStateUpdateTask overriding the pre-generated routing table entries for remote store clusters we have skipped the routing table override.
  • If we did not skip this, the new routing table would be created with recovery source as ExistingStoreRecoverySource which looks on local disk for recovery.

Related Issues

Resolves #9921

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@linuxpi linuxpi force-pushed the fix-auto-restore-index-red branch 2 times, most recently from db2a2b1 to a0f8c87 Compare September 11, 2023 02:55
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 11, 2023

Compatibility status:

Checks if related components are compatible with change 742ff75

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@linuxpi linuxpi changed the title skip overriding routing table when cluster state already contains rem… skip overriding routing table when cluster state already contains remote recovery source Sep 11, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@linuxpi linuxpi force-pushed the fix-auto-restore-index-red branch from 98a6bf3 to 7d21740 Compare September 11, 2023 13:43
@linuxpi linuxpi changed the title skip overriding routing table when cluster state already contains remote recovery source skip overriding routing table when it already contains entries with remote recovery source Sep 11, 2023
@linuxpi linuxpi marked this pull request as ready for review September 11, 2023 13:45
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@linuxpi
Copy link
Collaborator Author

linuxpi commented Sep 15, 2023

Test Failures

org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries
org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteLargeBlob
org.opensearch.remotestore.RemoteRestoreSnapshotIT.testRestoreShallowSnapshotRepositoryOverriden
org.opensearch.remotestore.RemoteStoreRestoreIT.testRateLimitedRemoteDownloads -- #9887
org.opensearch.snapshots.CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled

Ran locally as well with 10 iterations. all tests are passing. Only testCloneAfterRepoShallowSettingDisabled failed once out of 10
Tests with failures:

  • org.opensearch.snapshots.CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled {seed=[D5F195028D9B00DD:B154F260AC51D7B4]}

10 tests completed, 1 failed

@linuxpi linuxpi force-pushed the fix-auto-restore-index-red branch from b39d173 to 257afe3 Compare September 15, 2023 10:59
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteStoreStatsIT.testStatsOnRemoteStoreRestore

@shwetathareja
Copy link
Member

What we can do is instead of populating the routing table during remote state recovery, we can just populate the Metadata and populate the RoutingTable in ClusterStateUpdaters.updateRoutingTable with RemoteStoreRecoverySource. This is similar to how we are doing it for local disk state recovery as well.

In case you do this how will you identify it is local shard vs remote shard as you are doing it based on source? is there other way to identify the same?

@linuxpi
Copy link
Collaborator Author

linuxpi commented Sep 15, 2023

What we can do is instead of populating the routing table during remote state recovery, we can just populate the Metadata and populate the RoutingTable in ClusterStateUpdaters.updateRoutingTable with RemoteStoreRecoverySource. This is similar to how we are doing it for local disk state recovery as well.

In case you do this how will you identify it is local shard vs remote shard as you are doing it based on source? is there other way to identify the same?

We would identify just based on index setting and the check for recovery source will be removed. Current code is more safe IMO. But we should think about the other approach and handle this for remote store more holistically by testing other scenarios as well like recovery from local disk etc

Created an issue for testing and fix the local disk recovery for Remote Indices #10071 .

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: bansvaru <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: bansvaru <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testShardRoutingWithNetworkDisruption_FailOpenEnabled

@shwetathareja shwetathareja merged commit 5fdd418 into opensearch-project:main Sep 16, 2023
@shwetathareja shwetathareja added backport 2.x Backport to 2.x branch backport 2.10 Backport to 2.10 branch labels Sep 16, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 16, 2023
…emote recovery source (#9962)

* skip overriding routing table when it already contains entries with remote recovery source

Signed-off-by: bansvaru <[email protected]>
(cherry picked from commit 5fdd418)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 16, 2023
…emote recovery source (#9962)

* skip overriding routing table when it already contains entries with remote recovery source

Signed-off-by: bansvaru <[email protected]>
(cherry picked from commit 5fdd418)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shwetathareja pushed a commit that referenced this pull request Sep 18, 2023
…emote recovery source (#9962) (#10084)

* skip overriding routing table when it already contains entries with remote recovery source

(cherry picked from commit 5fdd418)

Signed-off-by: bansvaru <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/OpenSearch that referenced this pull request Sep 20, 2023
…emote recovery source (opensearch-project#9962)

* skip overriding routing table when it already contains entries with remote recovery source

Signed-off-by: bansvaru <[email protected]>
sachinpkale pushed a commit that referenced this pull request Sep 22, 2023
…emote recovery source (#9962) (#10083)

* skip overriding routing table when it already contains entries with remote recovery source


(cherry picked from commit 5fdd418)

Signed-off-by: bansvaru <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…emote recovery source (opensearch-project#9962)

* skip overriding routing table when it already contains entries with remote recovery source

Signed-off-by: bansvaru <[email protected]>
Signed-off-by: Ivan Brusic <[email protected]>
vikasvb90 pushed a commit to vikasvb90/OpenSearch that referenced this pull request Oct 10, 2023
…emote recovery source (opensearch-project#9962)

* skip overriding routing table when it already contains entries with remote recovery source

Signed-off-by: bansvaru <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…emote recovery source (opensearch-project#9962)

* skip overriding routing table when it already contains entries with remote recovery source

Signed-off-by: bansvaru <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.10 Backport to 2.10 branch bug Something isn't working Cluster Manager skip-changelog v2.10.0
Projects
None yet
5 participants