-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] SnapshotStressTestsIT testRandomActivities failing #109143
Labels
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
medium-risk
An open issue or test failure that is a medium risk to future releases
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
>test-failure
Triaged test failures from CI
Comments
DaveCTurner
added
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
>test-failure
Triaged test failures from CI
medium-risk
An open issue or test failure that is a medium risk to future releases
labels
May 29, 2024
elasticsearchmachine
added
the
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
label
May 29, 2024
Pinging @elastic/es-distributed (Team:Distributed) |
Repro: diff --git a/server/src/test/java/org/elasticsearch/snapshots/SnapshotsServiceTests.java b/server/src/test/java/org/elasticsearch/snapshots/SnapshotsServiceTests.java
index 56a28b11edf..bcc7a23bbec 100644
--- a/server/src/test/java/org/elasticsearch/snapshots/SnapshotsServiceTests.java
+++ b/server/src/test/java/org/elasticsearch/snapshots/SnapshotsServiceTests.java
@@ -401,6 +401,70 @@ public class SnapshotsServiceTests extends ESTestCase {
assertIsNoop(updatedClusterState, completeShardClone);
}
+ public void testPauseForNodeRemovalWithQueuedShards() throws Exception {
+ final var repoName = "test-repo";
+ final var snapshot1 = snapshot(repoName, "snap-1");
+ final var snapshot2 = snapshot(repoName, "snap-2");
+ final var indexName = "index-1";
+ final var shardId = new ShardId(index(indexName), 0);
+ final var repositoryShardId = new RepositoryShardId(indexId(indexName), 0);
+ final var nodeId = uuid();
+
+ final var runningEntry = snapshotEntry(
+ snapshot1,
+ Collections.singletonMap(indexName, repositoryShardId.index()),
+ Map.of(shardId, initShardStatus(nodeId))
+ );
+
+ final var queuedEntry = snapshotEntry(
+ snapshot2,
+ Collections.singletonMap(indexName, repositoryShardId.index()),
+ Map.of(shardId, SnapshotsInProgress.ShardSnapshotStatus.UNASSIGNED_QUEUED)
+ );
+
+ final var initialState = stateWithSnapshots(
+ ClusterState.builder(ClusterState.EMPTY_STATE)
+ .nodes(DiscoveryNodes.builder().add(DiscoveryNodeUtils.create(nodeId)).localNodeId(nodeId).masterNodeId(nodeId).build())
+ .routingTable(
+ RoutingTable.builder()
+ .add(
+ IndexRoutingTable.builder(shardId.getIndex())
+ .addShard(TestShardRouting.newShardRouting(shardId, nodeId, true, ShardRoutingState.STARTED))
+ )
+ .build()
+ )
+ .build(),
+ repoName,
+ runningEntry,
+ queuedEntry
+ );
+
+ final var updatedState = applyUpdates(
+ initialState,
+ new SnapshotsService.ShardSnapshotUpdate(
+ snapshot1,
+ shardId,
+ null,
+ new SnapshotsInProgress.ShardSnapshotStatus(
+ nodeId,
+ SnapshotsInProgress.ShardState.PAUSED_FOR_NODE_REMOVAL,
+ runningEntry.shards().get(shardId).generation()
+ ),
+ ActionTestUtils.assertNoFailureListener(t -> {})
+ )
+ );
+
+ assertEquals(
+ SnapshotsInProgress.ShardState.PAUSED_FOR_NODE_REMOVAL,
+ SnapshotsInProgress.get(updatedState).snapshot(snapshot1).shards().get(shardId).state()
+ );
+
+ assertEquals(
+ SnapshotsInProgress.ShardState.QUEUED,
+ SnapshotsInProgress.get(updatedState).snapshot(snapshot2).shards().get(shardId).state()
+ );
+ }
+
public void testSnapshottingIndicesExcludesClones() {
final String repoName = "test-repo";
final String indexName = "index"; |
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 29, 2024
elasticsearchmachine
pushed a commit
that referenced
this issue
May 31, 2024
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 31, 2024
elasticsearchmachine
pushed a commit
that referenced
this issue
May 31, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
medium-risk
An open issue or test failure that is a medium risk to future releases
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
>test-failure
Triaged test failures from CI
Found when investigating #108907 but looks like a different failure - a shard ends up being marked as
PAUSED_FOR_NODE_REMOVAL
in two separate snapshots:Copious logs:
testoutput-2024-05-28T23:17:03.715Z-fail.tar.gz
The text was updated successfully, but these errors were encountered: