Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ReplicationOperation should fail gracefully (#115341)
Problem: finishAsFailed could be called asynchronously in the middle of operations like runPostReplicationActions which try to sync the translog. finishAsFailed immediately triggers the failure of the resultListener which releases the index shard primary operation permit. This means that runPostReplicationActions may try to sync the translog without an operation permit. Solution: We refactor the infrastructure of ReplicationOperation regarding pendingActions and the resultListener, by replacing them with a RefCountingListener. This way, if there are async failures, they are aggregated, and the result listener is called once, after all mid-way operations are done. For the specific error we got in issue #97183, this means that a call to onNoLongerPrimary (which can happen if we fail to fail a replica shard or mark it as stale) will not immediately release the primary operation permit and the assertion in the translog sync will be honored. Fixes #97183
- Loading branch information