Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SnapshotStressTestsIT testRandomActivities failing #101028

Closed
ywangd opened this issue Oct 18, 2023 · 1 comment · Fixed by #101497
Closed

[CI] SnapshotStressTestsIT testRandomActivities failing #101028

ywangd opened this issue Oct 18, 2023 · 1 comment · Fixed by #101497
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@ywangd
Copy link
Member

ywangd commented Oct 18, 2023

This NPE issue is different from #99477 and #99516

CI Link

https://gradle-enterprise.elastic.co/s/v7icocftg32uc/tests/task/:server:internalClusterTest/details/org.elasticsearch.snapshots.SnapshotStressTestsIT/testRandomActivities%20%7Bseed=%5B32DA52B8E1546052:7737AEA21D8C5249%5D%7D?top-execution=1

Repro line

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.snapshots.SnapshotStressTestsIT.testRandomActivities {seed=[32DA52B8E1546052:7737AEA21D8C5249]}" -Dtests.seed=32DA52B8E1546052 -Dtests.locale=cs -Dtests.timezone=America/El_Salvador -Druntime.java=21

Does it reproduce?

No

Applicable branches

main

Failure history

https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.snapshots.SnapshotStressTestsIT&tests.test=testRandomActivities%20%7Bseed%3D%5B32DA52B8E1546052:7737AEA21D8C5249%5D%7D

Failure excerpt

WARNING: Uncaught exception in thread: Thread[#108,elasticsearch[node_s0][masterService#updateTask][T#1],5,TGRP-SnapshotStressTestsIT]
java.lang.AssertionError: java.lang.AssertionError: java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.SnapshotsInProgress$Entry.failure()" because "entry" is null
	at __randomizedtesting.SeedInfo.seed([32DA52B8E1546052]:0)
	at org.elasticsearch.snapshots.SnapshotsService.finalizeSnapshotEntry(SnapshotsService.java:1447)
	at org.elasticsearch.snapshots.SnapshotsService.runNextQueuedOperation(SnapshotsService.java:1535)
	at org.elasticsearch.snapshots.SnapshotsService.lambda$finalizeSnapshotEntry$28(SnapshotsService.java:1428)
	at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:177)
	at org.elasticsearch.repositories.FinalizeSnapshotContext.onResponse(FinalizeSnapshotContext.java:116)
	at org.elasticsearch.repositories.FinalizeSnapshotContext.onResponse(FinalizeSnapshotContext.java:28)
	at org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)
	at org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:310)
	at org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:230)
	at org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:259)
	at org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:173)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$finalizeSnapshot$19(BlobStoreRepository.java:1684)
	at org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:236)
	at org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:310)
	at org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:230)
	at org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:259)
	at org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:173)
	at org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:95)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$9.clusterStateProcessed(BlobStoreRepository.java:2690)
	at org.elasticsearch.cluster.service.MasterService$UnbatchedExecutor.lambda$execute$0(MasterService.java:552)
	at org.elasticsearch.cluster.service.MasterService$ExecutionResult.onPublishSuccess(MasterService.java:927)
	at org.elasticsearch.cluster.service.MasterService$4.onResponse(MasterService.java:374)
	at org.elasticsearch.cluster.service.MasterService$4.onResponse(MasterService.java:369)
	at org.elasticsearch.action.ActionListenerImplementations$RunAfterActionListener.onResponse(ActionListenerImplementations.java:260)
	at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:330)
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:32)
	at org.elasticsearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:50)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.AssertionError: java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.SnapshotsInProgress$Entry.failure()" because "entry" is null
	... 32 more
Caused by: java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.SnapshotsInProgress$Entry.failure()" because "entry" is null
	at org.elasticsearch.snapshots.SnapshotsService.finalizeSnapshotEntry(SnapshotsService.java:1311)
	... 31 more

@ywangd ywangd added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI low-risk An open issue or test failure that is a low risk to future releases labels Oct 18, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 18, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@volodk85 volodk85 added medium-risk An open issue or test failure that is a medium risk to future releases and removed low-risk An open issue or test failure that is a low risk to future releases labels Oct 23, 2023
@ywangd ywangd self-assigned this Oct 26, 2023
ywangd added a commit to ywangd/elasticsearch that referenced this issue Oct 29, 2023
When a snapshot is completed as a SnapshotsInProgress entry in cluster
state and also queued for next operations, it can lead to double
finalization of the snapshot if the entry in cluster state is processed
first. This PR fixes it by only start finalization if it is *not*
already in endingSnapshots.

The PR also adds a specific test case for the double finalization issue
(manifested as NPE).

Resolves: elastic#101028
ywangd added a commit that referenced this issue Oct 30, 2023
When a snapshot is completed as a SnapshotsInProgress entry in cluster
state and also queued for next operations, it can lead to double
finalization of the snapshot if the entry in cluster state is processed
first. This PR fixes it by only start finalization if it is *not*
already in endingSnapshots.

The PR also adds a specific test case for the double finalization issue
(manifested as NPE).

Resolves: #101028
ywangd added a commit to ywangd/elasticsearch that referenced this issue Oct 30, 2023
When a snapshot is completed as a SnapshotsInProgress entry in cluster
state and also queued for next operations, it can lead to double
finalization of the snapshot if the entry in cluster state is processed
first. This PR fixes it by only start finalization if it is *not*
already in endingSnapshots.

The PR also adds a specific test case for the double finalization issue
(manifested as NPE).

Resolves: elastic#101028
elasticsearchmachine pushed a commit that referenced this issue Oct 30, 2023
When a snapshot is completed as a SnapshotsInProgress entry in cluster
state and also queued for next operations, it can lead to double
finalization of the snapshot if the entry in cluster state is processed
first. This PR fixes it by only start finalization if it is *not*
already in endingSnapshots.

The PR also adds a specific test case for the double finalization issue
(manifested as NPE).

Resolves: #101028
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants