[BUG] [Remote Store] Shard routing table has wrong number of replicas while restoring an index with >= 1 replicas from remote store #8479

BhumikaSaini-Amazon · 2023-07-06T08:43:43Z

Describe the bug
An IllegalStateException is thrown due to a mismatch between the replica count in index metadata and the shards getting restored from remote store. While this does not block the restore at runtime, it is failing (new) integration tests for scenarios where the remote store-enabled index has >= 1 replicas. This behaviour is due to the following assert:

OpenSearch/server/src/main/java/org/opensearch/cluster/routing/allocation/AllocationService.java

Lines 174 to 179 in eca5a6c

    
           private ClusterState buildResult(ClusterState oldState, RoutingAllocation allocation) { 
        
               final RoutingTable oldRoutingTable = oldState.routingTable(); 
        
               final RoutingNodes newRoutingNodes = allocation.routingNodes(); 
        
               final RoutingTable newRoutingTable = new RoutingTable.Builder().updateNodes(oldRoutingTable.version(), newRoutingNodes).build(); 
        
               final Metadata newMetadata = allocation.updateMetadataWithRoutingChanges(newRoutingTable); 
        
               assert newRoutingTable.validate(newMetadata); // validates the routing table is coherent with the cluster state metadata

To Reproduce
Steps to reproduce the behavior:

Create a remote store-enabled index with >= 1 replicas.
Index some data.
Turn the index red by terminating the nodes housing the primary/replica shards.
Bring up new nodes to house the primary/replica shards, unless there are other nodes remaining.
Close the red index.
Trigger restore from remote store for the red index.

Logs similar to the following would be seen:

[2023-07-04T22:55:08,087][WARN ][o.o.s.RestoreService     ] [node_t0] failed to restore from remote store
java.lang.IllegalStateException: Shard [0] routing table has wrong number of replicas, expected [1], got [0]
	at org.opensearch.cluster.routing.IndexRoutingTable.validate(IndexRoutingTable.java:147) ~[classes/:?]
	at org.opensearch.cluster.routing.RoutingTable.validate(RoutingTable.java:184) ~[classes/:?]
	at org.opensearch.cluster.routing.allocation.AllocationService.buildResult(AllocationService.java:179) ~[classes/:?]
	at org.opensearch.cluster.routing.allocation.AllocationService.buildResultAndLogHealthChange(AllocationService.java:167) ~[classes/:?]
	at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:511) ~[classes/:?]
	at org.opensearch.snapshots.RestoreService$1.execute(RestoreService.java:276) ~[classes/:?]
	at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:65) ~[classes/:?]
	at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:874) ~[classes/:?]
	at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:424) ~[classes/:?]
	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:295) [classes/:?]
	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:206) [classes/:?]
	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204) [classes/:?]
	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242) [classes/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:795) [classes/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [classes/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
[2023-07-04T22:55:41,356][INFO ][o.o.r.RemoteStoreIT      ]

Expected behavior
The restore flow should gracefully handle indices with replication enabled.

Additional context
This behaviour is failing new integration tests being added for restore flow from remote store. One way to resolve this is to explicitly set the replica count to 0 in the index metadata before the restore:
i.e. changing

OpenSearch/server/src/main/java/org/opensearch/snapshots/RestoreService.java

Lines 247 to 253 in c9974a4

    
           IndexMetadata updatedIndexMetadata = IndexMetadata.builder(currentIndexMetadata) 
        
               .state(IndexMetadata.State.OPEN) 
        
               .version(1 + currentIndexMetadata.getVersion()) 
        
               .mappingVersion(1 + currentIndexMetadata.getMappingVersion()) 
        
               .settingsVersion(1 + currentIndexMetadata.getSettingsVersion()) 
        
               .aliasesVersion(1 + currentIndexMetadata.getAliasesVersion()) 
        
               .build();

to

IndexMetadata updatedIndexMetadata = IndexMetadata.builder(currentIndexMetadata)
                            .state(IndexMetadata.State.OPEN)
                            .version(1 + currentIndexMetadata.getVersion())
                            .mappingVersion(1 + currentIndexMetadata.getMappingVersion())
                            .settingsVersion(1 + currentIndexMetadata.getSettingsVersion())
                            .aliasesVersion(1 + currentIndexMetadata.getAliasesVersion())
                            .numberOfReplicas(0)
                            .build();

However, this would manual intervention post the restore to recover the replication configuration. We also need to analyze if this could have any cascading effects.

The text was updated successfully, but these errors were encountered:

sachinpkale · 2023-07-28T03:00:38Z

I have started working on this bugfix.

BhumikaSaini-Amazon added bug Something isn't working untriaged labels Jul 6, 2023

gbbafna added Storage:Durability Issues and PRs related to the durability framework and removed untriaged labels Jul 6, 2023

sachinpkale self-assigned this Jul 24, 2023

Bukhtawar added the Storage Issues and PRs relating to data and metadata storage label Jul 27, 2023

sachinpkale mentioned this issue Jul 28, 2023

Bugfix: add replica information in remote store restore flow #8951

Merged

6 tasks

gbbafna closed this as completed in #8951 Aug 28, 2023

sachinpkale mentioned this issue Sep 6, 2023

[Meta] Remote Store: 2.10.0 - Release Tracking #9805

Closed

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] [Remote Store] Shard routing table has wrong number of replicas while restoring an index with >= 1 replicas from remote store #8479

[BUG] [Remote Store] Shard routing table has wrong number of replicas while restoring an index with >= 1 replicas from remote store #8479

BhumikaSaini-Amazon commented Jul 6, 2023

sachinpkale commented Jul 28, 2023 •

edited

Loading

[BUG] [Remote Store] Shard routing table has wrong number of replicas while restoring an index with >= 1 replicas from remote store #8479

[BUG] [Remote Store] Shard routing table has wrong number of replicas while restoring an index with >= 1 replicas from remote store #8479

Comments

BhumikaSaini-Amazon commented Jul 6, 2023

sachinpkale commented Jul 28, 2023 • edited Loading

sachinpkale commented Jul 28, 2023 •

edited

Loading