Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Remote Store] Shard routing table has wrong number of replicas while restoring an index with >= 1 replicas from remote store #8479

Closed
BhumikaSaini-Amazon opened this issue Jul 6, 2023 · 1 comment · Fixed by #8951
Assignees
Labels
bug Something isn't working Storage:Durability Issues and PRs related to the durability framework Storage Issues and PRs relating to data and metadata storage

Comments

@BhumikaSaini-Amazon
Copy link
Contributor

Describe the bug
An IllegalStateException is thrown due to a mismatch between the replica count in index metadata and the shards getting restored from remote store. While this does not block the restore at runtime, it is failing (new) integration tests for scenarios where the remote store-enabled index has >= 1 replicas. This behaviour is due to the following assert:

private ClusterState buildResult(ClusterState oldState, RoutingAllocation allocation) {
final RoutingTable oldRoutingTable = oldState.routingTable();
final RoutingNodes newRoutingNodes = allocation.routingNodes();
final RoutingTable newRoutingTable = new RoutingTable.Builder().updateNodes(oldRoutingTable.version(), newRoutingNodes).build();
final Metadata newMetadata = allocation.updateMetadataWithRoutingChanges(newRoutingTable);
assert newRoutingTable.validate(newMetadata); // validates the routing table is coherent with the cluster state metadata

To Reproduce
Steps to reproduce the behavior:

  1. Create a remote store-enabled index with >= 1 replicas.
  2. Index some data.
  3. Turn the index red by terminating the nodes housing the primary/replica shards.
  4. Bring up new nodes to house the primary/replica shards, unless there are other nodes remaining.
  5. Close the red index.
  6. Trigger restore from remote store for the red index.

Logs similar to the following would be seen:

[2023-07-04T22:55:08,087][WARN ][o.o.s.RestoreService     ] [node_t0] failed to restore from remote store
java.lang.IllegalStateException: Shard [0] routing table has wrong number of replicas, expected [1], got [0]
	at org.opensearch.cluster.routing.IndexRoutingTable.validate(IndexRoutingTable.java:147) ~[classes/:?]
	at org.opensearch.cluster.routing.RoutingTable.validate(RoutingTable.java:184) ~[classes/:?]
	at org.opensearch.cluster.routing.allocation.AllocationService.buildResult(AllocationService.java:179) ~[classes/:?]
	at org.opensearch.cluster.routing.allocation.AllocationService.buildResultAndLogHealthChange(AllocationService.java:167) ~[classes/:?]
	at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:511) ~[classes/:?]
	at org.opensearch.snapshots.RestoreService$1.execute(RestoreService.java:276) ~[classes/:?]
	at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:65) ~[classes/:?]
	at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:874) ~[classes/:?]
	at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:424) ~[classes/:?]
	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:295) [classes/:?]
	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:206) [classes/:?]
	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204) [classes/:?]
	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242) [classes/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:795) [classes/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [classes/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
[2023-07-04T22:55:41,356][INFO ][o.o.r.RemoteStoreIT      ] 

Expected behavior
The restore flow should gracefully handle indices with replication enabled.

Additional context
This behaviour is failing new integration tests being added for restore flow from remote store. One way to resolve this is to explicitly set the replica count to 0 in the index metadata before the restore:
i.e. changing

IndexMetadata updatedIndexMetadata = IndexMetadata.builder(currentIndexMetadata)
.state(IndexMetadata.State.OPEN)
.version(1 + currentIndexMetadata.getVersion())
.mappingVersion(1 + currentIndexMetadata.getMappingVersion())
.settingsVersion(1 + currentIndexMetadata.getSettingsVersion())
.aliasesVersion(1 + currentIndexMetadata.getAliasesVersion())
.build();

to

IndexMetadata updatedIndexMetadata = IndexMetadata.builder(currentIndexMetadata)
                            .state(IndexMetadata.State.OPEN)
                            .version(1 + currentIndexMetadata.getVersion())
                            .mappingVersion(1 + currentIndexMetadata.getMappingVersion())
                            .settingsVersion(1 + currentIndexMetadata.getSettingsVersion())
                            .aliasesVersion(1 + currentIndexMetadata.getAliasesVersion())
                            .numberOfReplicas(0)
                            .build();

However, this would manual intervention post the restore to recover the replication configuration. We also need to analyze if this could have any cascading effects.

@BhumikaSaini-Amazon BhumikaSaini-Amazon added bug Something isn't working untriaged labels Jul 6, 2023
@gbbafna gbbafna added Storage:Durability Issues and PRs related to the durability framework and removed untriaged labels Jul 6, 2023
@sachinpkale sachinpkale self-assigned this Jul 24, 2023
@Bukhtawar Bukhtawar added the Storage Issues and PRs relating to data and metadata storage label Jul 27, 2023
@sachinpkale
Copy link
Member

sachinpkale commented Jul 28, 2023

I have started working on this bugfix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Durability Issues and PRs related to the durability framework Storage Issues and PRs relating to data and metadata storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants