Make Transport Shard Bulk Action Async (#39793) #41112
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a dependency of #39504
Motivation:
By refactoring
TransportShardBulkAction#shardOperationOnPrimary
to async, we enable usingDeterministicTaskQueue
based tests to run indexing operations. This was previously impossible since we were blocking on thewrite
thread until theupdate
thread finished the mapping update.With this change, the mapping update will trigger a new task in the
write
queue instead.This change significantly enhances the amount of coverage we get from
SnapshotResiliencyTests
(and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines.The logical change is effectively all in
TransportShardBulkAction
, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing theActionListener
down.Since the move to async would've added more parameters to the
private static
steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.backport of #39793 #41006 #40923 #40940 (the initial PR to
master
was causing test failures that have been resolved in the subsequent PRs in this list, so I squashed them all into one to not break7.x
)