-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Snapshot Resiliency Testing #39504
Closed
original-brownbear
wants to merge
74
commits into
elastic:master
from
original-brownbear:more-snapshot-resiliency-testing
Closed
More Snapshot Resiliency Testing #39504
original-brownbear
wants to merge
74
commits into
elastic:master
from
original-brownbear:more-snapshot-resiliency-testing
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this pull request
Mar 5, 2019
* Soften redundant cast to allow use of `DeterministicTaskQueue` in this class for elastic#39504 * Remove two redundant variables and lower visibility in two possible spots * Make field `final`
original-brownbear
added a commit
that referenced
this pull request
Mar 5, 2019
* Use threadpool's time in `ClusterApplierService` to allow for deterministic tests * This is a part of/requirement for #39504
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this pull request
Mar 5, 2019
* Use threadpool's time in `ClusterApplierService` to allow for deterministic tests * This is a part of/requirement for elastic#39504
original-brownbear
added a commit
that referenced
this pull request
Mar 5, 2019
original-brownbear
added a commit
that referenced
this pull request
Mar 5, 2019
* Soften redundant cast to allow use of `DeterministicTaskQueue` in this class for #39504 * Remove two redundant variables and lower visibility in two possible spots * Make field `final`
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this pull request
Mar 5, 2019
* Soften redundant cast to allow use of `DeterministicTaskQueue` in this class for elastic#39504 * Remove two redundant variables and lower visibility in two possible spots * Make field `final`
original-brownbear
added a commit
that referenced
this pull request
Mar 5, 2019
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this pull request
Mar 7, 2019
* Dependency of elastic#39504
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this pull request
Mar 29, 2019
* Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in elastic#39504
original-brownbear
added a commit
that referenced
this pull request
Apr 3, 2019
* Add Restore Operation to SnapshotResiliencyTests * Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in #39504
original-brownbear
added a commit
that referenced
this pull request
Apr 6, 2019
This is a dependency of #39504 Motivation: By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update. With this change, the mapping update will trigger a new task in the `write` queue instead. This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines. The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down. Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.
closing here since all of this is now part of other non-draft PRs |
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this pull request
Apr 11, 2019
This is a dependency of elastic#39504 Motivation: By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update. With this change, the mapping update will trigger a new task in the `write` queue instead. This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines. The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down. Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.
original-brownbear
added a commit
that referenced
this pull request
Apr 11, 2019
This is a dependency of #39504 Motivation: By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update. With this change, the mapping update will trigger a new task in the `write` queue instead. This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines. The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down. Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this pull request
Apr 25, 2019
* Add Restore Operation to SnapshotResiliencyTests * Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in elastic#39504
original-brownbear
added a commit
that referenced
this pull request
Apr 26, 2019
* Add Restore Operation to SnapshotResiliencyTests * Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in #39504
gurkankaymak
pushed a commit
to gurkankaymak/elasticsearch
that referenced
this pull request
May 27, 2019
* Add Restore Operation to SnapshotResiliencyTests * Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in elastic#39504
gurkankaymak
pushed a commit
to gurkankaymak/elasticsearch
that referenced
this pull request
May 27, 2019
This is a dependency of elastic#39504 Motivation: By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update. With this change, the mapping update will trigger a new task in the `write` queue instead. This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines. The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down. Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
>refactoring
>test
Issues or PRs that are addressing/adding tests
v7.2.0
v8.0.0-alpha1
WIP
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains all the steps needed to be able to simulate issues with eventually consistent blob-stores like AWS S3 in the
org.elasticsearch.snapshots.SnapshotResiliencyTests
, so that various failure scenarios can be reproduced in a deterministic fashion.As a prerequisite, this required being able to execute index and search operations via the deterministic task queue. This required removing all blocking logic from the bulk request execution which was done in this PR.
--- WIP more description incoming ---