Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort snapshots on a node that leaves the cluster #21084

Merged
merged 9 commits into from
Oct 26, 2016

Conversation

abeyad
Copy link

@abeyad abeyad commented Oct 23, 2016

Previously, if a node left the cluster (for example, due to a long GC),
during a snapshot, the master node would mark the snapshot as failed, but
the node itself could continue snapshotting the data on its shards to the
repository. If the node rejoins the cluster, the master may assign it to
hold the replica shard (where it held the primary before getting kicked off
the cluster). The initialization of the replica shard would repeatedly fail
with a ShardLockObtainFailedException until the snapshot thread finally
finishes and relinquishes the lock on the Store.

This commit resolves the situation by ensuring that when a shard is removed
from a node (such as when a node rejoins the cluster and realizes it no longer
holds the active shard copy), any snapshotting of the removed shards is aborted.
In the scenario above, when the node rejoins the cluster, it will see in the cluster
state that the node no longer holds the primary shard, so IndicesClusterStateService
will remove the shard, thereby causing any snapshots of that shard to be aborted.

Closes #20876

Previously, if a node left the cluster (for example, due to a long GC),
during a snapshot, the master node would mark the snapshot as failed, but
the node itself could continue snapshotting the data on its shards to the
repository.  If the node rejoins the cluster, the master may assign it to
hold the replica shard (where it held the primary before getting kicked off
the cluster).  The initialization of the replica shard would repeatedly fail
with a ShardLockObtainFailedException until the snapshot thread finally
finishes and relinquishes the lock on the Store.

This commit resolves the situation by ensuring that the shard snapshot is
aborted when the node responsible for that shard's snapshot leaves the cluster.
When the node rejoins the cluster, it will see in the cluster state that
the snapshot for that shard is failed and abort the snapshot locally,
allowing the shard data directory to be freed for allocation of a replica
shard on the same node.

Closes elastic#20876
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the better change would be to treat aborting snapshots in the same way as we abort outgoing peer recoveries of a primary: by registering an IndexEventListener listening for beforeIndexShardClosed and cancelling recoveries / abort ongoing snapshots at that time. This ensures that snapshots are aborted whenever we close the shard, simplifying the logic here. WDYT?

@@ -402,6 +402,10 @@ public String toString() {
for (DiscoveryNode node : this) {
sb.append(node).append(',');
}
if (sb.length() > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe simpler to replace
for (DiscoveryNode node : this) { sb.append(node).append(','); }
by sb.append(Strings.collectionToDelimitedString(this, ","));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@abeyad
Copy link
Author

abeyad commented Oct 24, 2016

@ywelsch the PR has been updated to use the beforeIndexShardClosed callback, the test has been made much more lightweight (doesn't use network disruptions to simulate the shard removal situation), and the PR description has been updated to reflect the new approach.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is good but the test needs a bit more work (I'm not sure it's testing the right thing).

String nodeWithPrimary = clusterState.nodes().get(indexRoutingTable.shard(0).primaryShard().currentNodeId()).getName();
assertNotNull("should be at least one node with a primary shard", nodeWithPrimary);
IndicesService indicesService = internalCluster().getInstance(IndicesService.class, nodeWithPrimary);
indicesService.deleteIndex(resolveIndex(index), "trigger shard removal");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeIndex might be good enough here.

if (snapshotsInProgress != null && snapshotsInProgress.entries().size() > 0) {
assertEquals(State.SUCCESS, snapshotsInProgress.entries().get(0).state());
}
}, 10, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertBusy uses by default 10 seconds, no need to specify it here again

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Settings.builder().put("number_of_shards", numPrimaries).put("number_of_replicas", numReplicas)));

logger.info("--> indexing some data");
Client client = client();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use a random client every time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

SnapshotsInProgress snapshotsInProgress =
client.admin().cluster().prepareState().get().getState().custom(SnapshotsInProgress.TYPE);
if (snapshotsInProgress != null && snapshotsInProgress.entries().size() > 0) {
assertEquals(State.SUCCESS, snapshotsInProgress.entries().get(0).state());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will succeed even without the change in this PR? I'm not sure what is exactly tested here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the change here, the snapshot forever stalls and the test times out, because the snapshot was never aborted. This asserts that we abort the snapshot, bringing the snapshotting to a successful conclusion.

@@ -2490,4 +2493,80 @@ public void testGetSnapshotsRequest() throws Exception {
waitForCompletion(repositoryName, inProgressSnapshot, TimeValue.timeValueSeconds(60));
}

/**
* This test ensures that if a node that holds a primary that is being snapshotted leaves the cluster,
* when it returns, the node aborts the snapshotting on the now removed shard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the description does not match what the test does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

logger.info("--> waiting for snapshot to be in progress on all nodes");
assertBusy(() -> {
for (String node : internalCluster().nodesInclude(index)) {
final Client nodeClient = client(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use this particular client?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a mistake here, this only ensures the snapshot cluster state update has reached master, so I changed it to use internalCluster().clusterService(node).state() instead, to ensure each node knows that the snapshot is in progress.

}
}, 10, TimeUnit.SECONDS);

// Pick a node with a primary shard and remove the shard from the node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pick node with THE primary shard

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@abeyad
Copy link
Author

abeyad commented Oct 25, 2016

@ywelsch I pushed 651bdbf that I think improves the test a lot. Instead of the throttling, it uses the mock repository with blocks, then unblocks after removing the shard, triggering the snapshot abort.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left two small suggestions. Thanks @abeyad

logger.info("--> waiting for snapshot to complete");
waitForCompletion(repo, snapshot, TimeValue.timeValueSeconds(10));

// make sure snapshot is aborted and the aborted shard was marked as failed
assertBusy(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no assertBusy needed with waitForCompletion above?

assertEquals(State.SUCCESS, snapshotsInProgress.entries().get(0).state());
}
}, 10, TimeUnit.SECONDS);
List<SnapshotInfo> snapshotInfos = client().admin().cluster().prepareGetSnapshots(repo).setSnapshots(snapshot).get().getSnapshots();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForCompletion returns SnapshotInfo

@abeyad
Copy link
Author

abeyad commented Oct 26, 2016

I pushed abd71f5 that uses SnapshotInfo returned from waitForCompletion. Thanks for the review @ywelsch !

@abeyad abeyad merged commit c88452d into elastic:master Oct 26, 2016
@abeyad abeyad deleted the cancel_snapshot_on_shard_close branch October 26, 2016 14:04
abeyad pushed a commit that referenced this pull request Oct 26, 2016
Previously, if a node left the cluster (for example, due to a long GC),
during a snapshot, the master node would mark the snapshot as failed, but
the node itself could continue snapshotting the data on its shards to the
repository. If the node rejoins the cluster, the master may assign it to
hold the replica shard (where it held the primary before getting kicked off
the cluster). The initialization of the replica shard would repeatedly fail
with a ShardLockObtainFailedException until the snapshot thread finally
finishes and relinquishes the lock on the Store.

This commit resolves the situation by ensuring that when a shard is removed
from a node (such as when a node rejoins the cluster and realizes it no longer
holds the active shard copy), any snapshotting of the removed shards is aborted.
In the scenario above, when the node rejoins the cluster, it will see in the cluster 
state that the node no longer holds the primary shard, so IndicesClusterStateService
will remove the shard, thereby causing any snapshots of that shard to be aborted.

Closes #20876
@abeyad
Copy link
Author

abeyad commented Oct 26, 2016

5.x commit: 1d278d2

ywelsch pushed a commit that referenced this pull request Nov 22, 2016
Previously, if a node left the cluster (for example, due to a long GC),
during a snapshot, the master node would mark the snapshot as failed, but
the node itself could continue snapshotting the data on its shards to the
repository. If the node rejoins the cluster, the master may assign it to
hold the replica shard (where it held the primary before getting kicked off
the cluster). The initialization of the replica shard would repeatedly fail
with a ShardLockObtainFailedException until the snapshot thread finally
finishes and relinquishes the lock on the Store.

This commit resolves the situation by ensuring that when a shard is removed
from a node (such as when a node rejoins the cluster and realizes it no longer
holds the active shard copy), any snapshotting of the removed shards is aborted.
In the scenario above, when the node rejoins the cluster, it will see in the cluster
state that the node no longer holds the primary shard, so IndicesClusterStateService
will remove the shard, thereby causing any snapshots of that shard to be aborted.

Closes #20876
@ywelsch
Copy link
Contributor

ywelsch commented Nov 22, 2016

5.0 commit: 22ee78c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug report]LockObtainFailedException throws under presure
3 participants