-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit concurrent snapshot file restores in recovery per node #79316
Limit concurrent snapshot file restores in recovery per node #79316
Conversation
… recoveries Today we limit the max number of concurrent snapshot file restores per recovery. This works well when the default node_concurrent_recoveries is used (which is 2). When this limit is increased, it is possible to exahust the underlying repository connection pool, affecting other workloads. This commit adds a new setting `indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that allows to limit the max number of snapshot file downloads per node during recoveries. When a recovery starts in the target node it tries to acquire a permit that allows it to download snapshot files when it is granted. This is communicated to the source node in the StartRecoveryRequest. This is a rather conservative approach since it is possible that a recovery that gets a permit to use snapshot files doesn't recover any snapshot file while there's a concurrent recovery that doesn't get a permit could take advantage of recovering from a snapshot. Closes elastic#79044
0514602
to
8a8b13d
Compare
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left some small comments & suggestions.
@@ -161,7 +162,7 @@ | |||
private volatile TimeValue internalActionRetryTimeout; | |||
private volatile TimeValue internalActionLongTimeout; | |||
private volatile boolean useSnapshotsDuringRecovery; | |||
private volatile int maxConcurrentSnapshotFileDownloads; | |||
private volatile int getMaxConcurrentSnapshotFileDownloads; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Looks like a rename refactoring was a bit overzealous here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
@@ -138,9 +141,17 @@ public void beforeIndexShardClosed(ShardId shardId, @Nullable IndexShard indexSh | |||
} | |||
|
|||
public void startRecovery(final IndexShard indexShard, final DiscoveryNode sourceNode, final RecoveryListener listener) { | |||
final Releasable snapshotFileDownloadsPermit = | |||
recoverySnapshotFileDownloadsThrottler.tryAcquire(recoverySettings.getMaxConcurrentSnapshotFileDownloads()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fail to acquire permits then we should log a warning, indicating that the user should reduce cluster.routing.allocation.node_concurrent_recoveries
to be at most indices.recovery.max_concurrent_snapshot_file_downloads / indices.recovery.max_concurrent_snapshot_file_downloads_per_node
.
Relatedly it doesn't make sense for indices.recovery.max_concurrent_snapshot_file_downloads_per_node
to be less than indices.recovery.max_concurrent_snapshot_file_downloads
, should we validate that?
Also this change would let us respect indices.recovery.use_snapshots
on the target, simply by not even trying to acquire permits if indices.recovery.use_snapshots
is false.
(also the Javadoc for indices.recovery.use_snapshots
indicates that it defaults to false
but it actually defaults to true
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in ed9c4ef
import org.elasticsearch.core.Releasable; | ||
import org.elasticsearch.core.Releasables; | ||
|
||
public class RecoverySnapshotFileDownloadsThrottler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we fold most of this class into RecoverySettings
? I think it'd be ok just to have a RecoverySettings#tryAcquireSnapshotDownloadPermits
method, or if you prefer you can expose a wrapper like we do with rateLimiter()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in ed9c4ef
server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java
Show resolved
Hide resolved
@@ -31,6 +31,7 @@ | |||
private Store.MetadataSnapshot metadataSnapshot; | |||
private boolean primaryRelocation; | |||
private long startingSeqNo; | |||
private boolean hasPermitsToDownloadSnapshotFiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's just call this canDownloadSnapshotFiles
, there may be other reasons it can't (e.g. indices.recovery.use_snapshots
is false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in ed9c4ef
@elasticmachine run elasticsearch-ci/bwc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments/questions about respecting the use of snapshots on the source node too and everything else is just tiny things.
@@ -127,7 +126,7 @@ | |||
|
|||
public RecoverySourceHandler(IndexShard shard, RecoveryTargetHandler recoveryTarget, ThreadPool threadPool, | |||
StartRecoveryRequest request, int fileChunkSizeInBytes, int maxConcurrentFileChunks, | |||
int maxConcurrentOperations, int maxConcurrentSnapshotFileDownloads, boolean useSnapshots, | |||
int maxConcurrentOperations, int maxConcurrentSnapshotFileDownloads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I sort of see that it doesn't make sense to use the setting on the source node, but in the BwC case we treat the target as if it can use snapshots, is this safe?
if (snapshotFileDownloadsPermit == null) { | ||
logger.warn(String.format(Locale.ROOT, | ||
"Unable to acquire permit to use snapshot files during recovery, this recovery will recover from the source node. " + | ||
"[%s] should have the same value as [%s]/[%s]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit is only an upper bound, you could have fewer concurrent recoveries, but also I'd suggest just saying the number rather than naming the settings since otherwise folk will just increase max_concurrent_snapshot_file_downloads_per_node
and run into bigger problems when they run out of HTTP connections.
"[%s] should have the same value as [%s]/[%s]", | |
"Ensure snapshot files can be used during recovery by setting [%s] to be no greater than [%d]", |
@@ -67,6 +67,8 @@ | |||
private final IndexShard indexShard; | |||
private final DiscoveryNode sourceNode; | |||
private final SnapshotFilesProvider snapshotFilesProvider; | |||
@Nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nullable | |
@Nullable // if we're not downloading files from snapshots in this recovery |
@@ -119,5 +132,8 @@ public void writeTo(StreamOutput out) throws IOException { | |||
metadataSnapshot.writeTo(out); | |||
out.writeBoolean(primaryRelocation); | |||
out.writeLong(startingSeqNo); | |||
if (out.getVersion().onOrAfter(RecoverySettings.SNAPSHOT_FILE_DOWNLOAD_THROTTLING_SUPPORTED_VERSION)) { | |||
out.writeBoolean(canDownloadSnapshotFiles); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to drop this value no matter whether it's true or false when dealing with an older node? I worry that we might have some trouble from this lenience, plus the fact that it defaults to true
if missing and that we no longer care about the setting on the source node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point, maybe we should keep the check for indices.recovery.use_snapshots
in the source node too? that way we would keep the current behaviour in a mixed-version cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think that'd be best.
@@ -714,12 +714,17 @@ protected Node(final Environment initialEnvironment, | |||
clusterService | |||
); | |||
final RecoveryPlannerService recoveryPlannerService = new SnapshotsRecoveryPlannerService(shardSnapshotsService); | |||
final SnapshotFilesProvider snapshotFilesProvider = | |||
new SnapshotFilesProvider(repositoryService); | |||
final SnapshotFilesProvider snapshotFilesProvider = new SnapshotFilesProvider(repositoryService); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can revert these changes now, they're only whitespace/import reordering right?
indicesClusterStateService = new IndicesClusterStateService( | ||
settings, | ||
indicesService, | ||
clusterService, | ||
threadPool, | ||
new PeerRecoveryTargetService(threadPool, transportService, recoverySettings, clusterService, snapshotFilesProvider), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, this change isn't needed any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine run elasticsearch-ci/part-1 |
Thanks David! |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Today we limit the max number of concurrent snapshot file restores per recovery. This works well when the default node_concurrent_recoveries is used (which is 2). When this limit is increased, it is possible to exhaust the underlying repository connection pool, affecting other workloads. This commit adds a new setting `indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that allows to limit the max number of snapshot file downloads per node during recoveries. When a recovery starts in the target node it tries to acquire a permit that allows it to download snapshot files when it is granted. This is communicated to the source node in the StartRecoveryRequest. This is a rather conservative approach since it is possible that a recovery that gets a permit to use snapshot files doesn't recover any snapshot file while there's a concurrent recovery that doesn't get a permit could take advantage of recovering from a snapshot. Closes elastic#79044 Backport of elastic#79316
I just noticed a couple test failures that could be related:
These don't reproduce for me locally. |
Today we limit the max number of concurrent snapshot file restores per recovery. This works well when the default node_concurrent_recoveries is used (which is 2). When this limit is increased, it is possible to exhaust the underlying repository connection pool, affecting other workloads. This commit adds a new setting `indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that allows to limit the max number of snapshot file downloads per node during recoveries. When a recovery starts in the target node it tries to acquire a permit that allows it to download snapshot files when it is granted. This is communicated to the source node in the StartRecoveryRequest. This is a rather conservative approach since it is possible that a recovery that gets a permit to use snapshot files doesn't recover any snapshot file while there's a concurrent recovery that doesn't get a permit could take advantage of recovering from a snapshot. Closes #79044 Backport of #79316
Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries
is used (which is 2). When this limit isincreased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.
This commit adds a new setting
max_concurrent_snapshot_file_downloads_per_node
that allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot. This should cover most cases and protect the rest of the
workloads that use the same repository when the
node_concurrent_recoveries
is larger than the default.
Closes #79044