Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix testRecoveryIsCancelledAfterDeletingTheIndex #76644

Conversation

henningandersen
Copy link
Contributor

Closes #76560

@henningandersen henningandersen added >test-failure Triaged test failures from CI :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.15.0 labels Aug 18, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@henningandersen henningandersen merged commit 179a3fc into elastic:master Aug 23, 2021
@@ -496,6 +490,12 @@ public void testRecoveryIsCancelledAfterDeletingTheIndex() throws Exception {
}
);

assertAcked(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the difference between these two snippets is that we have to register a send behaviour before updating indices settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if we update settings first, we risk the recovery starting or completing before we add the send behavior. That makes the recoverSnapshotFileRequestReceived.await() a few lines down never complete, failing the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had an inadvertent race between the call to addRequestHandlingBehavior and the end of the recovery triggered by the update settings call - if the recovery completed before we got around to adding the request handler then we'd never count down the recoverSnapshotFileRequestReceived latch. The fix is to hook into the transport service before we even start the recovery.

wjp719 added a commit to wjp719/elasticsearch that referenced this pull request Aug 24, 2021
* master: (21 commits)
  [Test] More robust assertions for sorting and pagination (elastic#76654)
  [Test] Fix filename check on Windows (elastic#76807)
  Upgrade build scan plugin to 3.6.4 (elastic#76784)
  Remove keystore initial_md5sum (elastic#76835)
  Don't export docker images on assemble (elastic#76817)
  Fix testMasterStatsOnSuccessfulUpdate (elastic#76844)
  AwaitsFix for elastic#76840
  Make Releasing Aggregation Buffers Safer (elastic#76741)
  Re-enable BWC tests after backport of elastic#76771 (elastic#76839)
  Dispatch large bulk requests to write thread  (elastic#76736)
  Disable BWC tests for elastic#76771
  Pull down beats artifacts when performing release tests
  Add timing stats to publication process (elastic#76771)
  Fix BanFailureLoggingTests some more (elastic#76668)
  Mention "warn threshold" in master service slowlog (elastic#76815)
  Fix DockerTests.test010Install
  Re-enable tests affected by elastic#75097 (elastic#76814)
  Fix testRecoveryIsCancelledAfterDeletingTheIndex (elastic#76644)
  Test fix -WildcardFieldMapperTests bad test data. (elastic#76819)
  Updating supported version after backporting the feature (elastic#76794)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI v7.15.1 v7.16.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SnapshotBasedIndexRecoveryIT testRecoveryIsCancelledAfterDeletingTheIndex failing
6 participants