Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute shardStateId before addAbortListener #100809

Conversation

DaveCTurner
Copy link
Contributor

It is not valid to call SnapshotIndexCommit#indexCommit() if the
snapshot is aborted, so we must compute shardStateId before adding the
abort listener.

Closes #99477

It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes elastic#99477
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.11.1 v8.12.0 v8.10.5 labels Oct 13, 2023
@DaveCTurner DaveCTurner requested a review from ywangd October 13, 2023 08:59
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor Author

This problem was introduced in #96442 and AFAICT it only matters for tests - in practice it's ok to call getShardStateId() with a closed IndexCommit since we're only looking at its userData here. But it seems better to respect the refcounting rather than find some way to bypass it.

Comment on lines +386 to 388
final var shardStateId = getShardStateId(indexShard, snapshotIndexCommit.indexCommit()); // not aborted so indexCommit() ok
snapshotStatus.addAbortListener(makeAbortListener(indexShard.shardId(), snapshot, snapshotIndexCommit));
snapshotStatus.ensureNotAborted();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some help to understand why this fix works. The change is to call getShardStateId before snapshotStatus.addAbortListener and the goal is to make sure snapshotIndexCommit is not released when the indexCommit() method is call against it.

It seems the fix would work if some how snapshotStatus.addAbortListener(...) immediately calls the added listener which in turn releases snapshotIndexCommit. This suggests snapshotStatus is already aborted. But the next line snapshotStatus.ensureNotAborted() says it must not be aborted already. So I am confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that an abort may happen concurrently, in between calling snapshotStatus.ensureNotAborted() and snapshotIndexCommit.indexCommit().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. I always need a prompt for concurrency. Thanks!

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 8c9136e into elastic:main Oct 13, 2023
@DaveCTurner DaveCTurner deleted the 2023/10/13/SnapshotShardsService-shardStateId-before-abort branch October 13, 2023 12:56
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes elastic#99477
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes elastic#99477
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.11
8.10

elasticsearchmachine pushed a commit that referenced this pull request Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes #99477
elasticsearchmachine pushed a commit that referenced this pull request Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes #99477
@DaveCTurner DaveCTurner restored the 2023/10/13/SnapshotShardsService-shardStateId-before-abort branch June 17, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.10.5 v8.11.1 v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SnapshotStressTestsIT testRandomActivities failing
3 participants