Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleBlocksIT.testAddBlockWhileDeletingIndices failing #116071

Closed
kingherc opened this issue Nov 1, 2024 · 3 comments · Fixed by #116074
Closed

SimpleBlocksIT.testAddBlockWhileDeletingIndices failing #116071

kingherc opened this issue Nov 1, 2024 · 3 comments · Fixed by #116074
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@kingherc
Copy link
Contributor

kingherc commented Nov 1, 2024

CI Link

https://gradle-enterprise.elastic.co/s/uz5p3xwiwfsqq

Repro line

./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.blocks.SimpleBlocksIT.testAddBlockWhileDeletingIndices" -Dtests.seed=18DD19966E2CF499 -Dtests.locale=dyo-SN -Dtests.timezone=Asia/Ust-Nera -Druntime.java=23

Does it reproduce?

Didn't try

Applicable branches

main

Failure history

No response

Failure excerpt

Likely introduced by PR #115341

REPRODUCE WITH: ./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.blocks.SimpleBlocksIT.testAddBlockWhileDeletingIndices" -Dtests.seed=18DD19966E2CF499 -Dtests.locale=dyo-SN -Dtests.timezone=Asia/Ust-Nera -Druntime.java=23

SimpleBlocksIT > testAddBlockWhileDeletingIndices FAILED
    java.lang.AssertionError: [org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/ChannelActionListener{TaskTransportChannel{task=160}{DirectResponseChannel{req=91}{indices:admin/block/add[s][p]}}}/org.elasticsearch.action.support.replication.TransportReplicationAction$$Lambda/0x00007f63e3a6ebc8@72916952] org.elasticsearch.ElasticsearchException: executed already
        at __randomizedtesting.SeedInfo.seed([18DD19966E2CF499]:0)
        at org.elasticsearch.action.ActionListener$3.assertFirstRun(ActionListener.java:393)
        at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:409)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onFailure(TransportReplicationAction.java:553)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.handleException(TransportReplicationAction.java:547)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:541)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:443)
        at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:257)
        at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$38(IndexShard.java:3585)
        at org.elasticsearch.action.ActionListenerImplementations$DelegatingFailureActionListener.onResponse(ActionListenerImplementations.java:219)
        at org.elasticsearch.index.shard.IndexShard.lambda$asyncBlockOperations$39(IndexShard.java:3597)
        at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:257)
        at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:400)
        at org.elasticsearch.index.shard.IndexShardOperationPermits$1.doRun(IndexShardOperationPermits.java:119)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.gradle.api.internal.tasks.testing.worker.TestWorker (file:/opt/buildkite-agent/.gradle/wrapper/dists/gradle-8.10.2-all/7iv73wktx1xtkvlq19urqw1wm/gradle-8.10.2/lib/plugins/gradle-testing-base-infrastructure-8.10.2.jar)
WARNING: Please consider reporting this to the maintainers of org.gradle.api.internal.tasks.testing.worker.TestWorker
WARNING: System::setSecurityManager will be removed in a future release
@kingherc kingherc added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >test-failure Triaged test failures from CI needs:triage Requires assignment of a team area label Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Nov 1, 2024
@elasticsearchmachine elasticsearchmachine added the needs:risk Requires assignment of a risk label (low, medium, blocker) label Nov 1, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Nov 1, 2024
@kingherc
Copy link
Contributor Author

kingherc commented Nov 1, 2024

I'm unsure whether this might mean the onFailure might be called twice and whether that has any meaningful negative repercussions, so assinging low risk for now, but will try to handle it now.

@kingherc kingherc added the low-risk An open issue or test failure that is a low risk to future releases label Nov 1, 2024
@elasticsearchmachine elasticsearchmachine removed the needs:risk Requires assignment of a risk label (low, medium, blocker) label Nov 1, 2024
@kingherc
Copy link
Contributor Author

kingherc commented Nov 1, 2024

Found exceptions from inside execute() can escape:

  1> org.elasticsearch.index.shard.IndexShardClosedException: CurrentState[CLOSED] operation only allowed when not closed
  1>    at org.elasticsearch.index.shard.IndexShard.verifyNotClosed(IndexShard.java:2411) ~[main/:?]
  1>    at org.elasticsearch.index.shard.IndexShard.verifyNotClosed(IndexShard.java:2405) ~[main/:?]
  1>    at org.elasticsearch.index.shard.IndexShard.getReplicationGroup(IndexShard.java:3004) ~[main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.getReplicationGroup(TransportReplicationAction.java:1204) ~[main/:?]
  1>    at org.elasticsearch.action.support.replication.ReplicationOperation.checkActiveShardCount(ReplicationOperation.java:482) ~[main/:?]
  1>    at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:118) ~[main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:538) ~[main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:443) ~[main/:?]
  1>    at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:257) ~[main/:?]
  1>    at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$38(IndexShard.java:3585) ~[main/:?]
  1>    at org.elasticsearch.action.ActionListenerImplementations$DelegatingFailureActionListener.onResponse(ActionListenerImplementations.java:219) ~[main/:?]
  1>    at org.elasticsearch.index.shard.IndexShard.lambda$asyncBlockOperations$39(IndexShard.java:3597) ~[main/:?]
  1>    at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:257) ~[main/:?]
  1>    at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:400) ~[main/:?]
  1>    at org.elasticsearch.index.shard.IndexShardOperationPermits$1.doRun(IndexShardOperationPermits.java:119) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:1575) ~[?:?]

Will open a fix.

kingherc added a commit to kingherc/elasticsearch that referenced this issue Nov 1, 2024
We introduce ActionListener.run() in order to ensure the
RefCountingListener introduced by PR elastic#115341 , is the single point
that is failed upon exceptions, and no exception escapes through
the ReplicationOperation.execute() method.

Fixes elastic#116071
jfreden pushed a commit to jfreden/elasticsearch that referenced this issue Nov 4, 2024
We introduce ActionListener.run() in order to ensure the
RefCountingListener introduced by PR elastic#115341 , is the single point
that is failed upon exceptions, and no exception escapes through
the ReplicationOperation.execute() method.

Fixes elastic#116071
Fixes elastic#116081
Fixes elastic#116073
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants