Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.x] Limit concurrent snapshot file restores in recovery per node #79379

Merged

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Oct 18, 2021

Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.

This commit adds a new setting
indices.recovery.max_concurrent_snapshot_file_downloads_per_node that
allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot.

Closes #79044
Backport of #79316

Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.

This commit adds a new setting
`indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that
allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot.

Closes elastic#79044
Backport of elastic#79316
@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 18, 2021

@elasticmachine run elasticsearch-ci/packaging-tests-unix-sample
Transient docker error

@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 19, 2021

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Oct 19, 2021

@elasticmachine update branch

@fcofdez fcofdez merged commit 3bd8055 into elastic:7.x Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants