Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transfrom] prevent concurrent state persistence when indexer gets triggered during shutdown #69551

Merged

Conversation

hendrikmuhs
Copy link

when the indexer is shutting down it sets the state before it persists
the latest state, during the 2 stages a new run might get triggered and
run into a race condition where a new state persists runs while the old
has not finished yet. This change prevents the trigger if the indexer is
in the described intermediate state

fixes #67121

the latest state, during the 2 stages a new run might get triggered and
run into a race condition where a new state persists runs while the old
has not finished yet. This change prevents the trigger if the indexer is
in the described intermediate state

fixes elastic#67121
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@hendrikmuhs hendrikmuhs merged commit 122ae34 into elastic:master Feb 25, 2021
@hendrikmuhs hendrikmuhs deleted the transform-index-trigger-race branch February 25, 2021 09:35
hendrikmuhs pushed a commit that referenced this pull request Feb 25, 2021
…iggered during shutdown (#69551)

the latest state, during the 2 stages a new run might get triggered and
run into a race condition where a new state persists runs while the old
has not finished yet. This change prevents the trigger if the indexer is
in the described intermediate state

fixes #67121
hendrikmuhs pushed a commit that referenced this pull request Feb 25, 2021
…iggered during shutdown (#69551)

the latest state, during the 2 stages a new run might get triggered and
run into a race condition where a new state persists runs while the old
has not finished yet. This change prevents the trigger if the indexer is
in the described intermediate state

fixes #67121
hendrikmuhs pushed a commit that referenced this pull request Feb 25, 2021
…iggered during shutdown (#69551)

the latest state, during the 2 stages a new run might get triggered and
run into a race condition where a new state persists runs while the old
has not finished yet. This change prevents the trigger if the indexer is
in the described intermediate state

fixes #67121
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request Mar 11, 2021
…n the

process of shutting down, when the test thread triggers it again.

relates elastic#69551
fixes elastic#70297
hendrikmuhs pushed a commit that referenced this pull request Mar 15, 2021
…mIndexerFailureHandlingTests (#70326)

fix a race condition in the test: the indexer thread might still be in the
process of shutting down, when the test thread triggers it again.

relates #69551
fixes #70297
hendrikmuhs pushed a commit that referenced this pull request Mar 15, 2021
…mIndexerFailureHandlingTests (#70326)

fix a race condition in the test: the indexer thread might still be in the
process of shutting down, when the test thread triggers it again.

relates #69551
fixes #70297
hendrikmuhs pushed a commit that referenced this pull request Mar 15, 2021
…mIndexerFailureHandlingTests (#70326)

fix a race condition in the test: the indexer thread might still be in the
process of shutting down, when the test thread triggers it again.

relates #69551
fixes #70297
hendrikmuhs pushed a commit that referenced this pull request Mar 15, 2021
…mIndexerFailureHandlingTests (#70326)

fix a race condition in the test: the indexer thread might still be in the
process of shutting down, when the test thread triggers it again.

relates #69551
fixes #70297
hendrikmuhs pushed a commit that referenced this pull request Mar 22, 2021
shouldStopAtCheckpoint tells transform to stop at the next checkpoint, if
this API is called while a checkpoint is finishing, it can cause a race condition
in state persistence. This is similar to #69551, but this time in a different
place.

With this change _stop?shouldStopAtCheckpoint=true does not call doSaveState
if indexer is shutting down. Still it ensures the job stops after the indexer has
shutdown. Apart from that the change fixes: a logging problem, it adds error
handling in case of a timeout during _stop?shouldStopAtCheckpoint=true. Some
logic has been moved from the task to the indexer.

fixes #70416
hendrikmuhs pushed a commit that referenced this pull request Mar 22, 2021
shouldStopAtCheckpoint tells transform to stop at the next checkpoint, if
this API is called while a checkpoint is finishing, it can cause a race condition
in state persistence. This is similar to #69551, but this time in a different
place.

With this change _stop?shouldStopAtCheckpoint=true does not call doSaveState
if indexer is shutting down. Still it ensures the job stops after the indexer has
shutdown. Apart from that the change fixes: a logging problem, it adds error
handling in case of a timeout during _stop?shouldStopAtCheckpoint=true. Some
logic has been moved from the task to the indexer.

fixes #70416
hendrikmuhs pushed a commit that referenced this pull request Apr 6, 2021
…71343)

shouldStopAtCheckpoint tells transform to stop at the next checkpoint, if
this API is called while a checkpoint is finishing, it can cause a race condition
in state persistence. This is similar to #69551, but this time in a different
place.

With this change _stop?shouldStopAtCheckpoint=true does not call doSaveState
if indexer is shutting down. Still it ensures the job stops after the indexer has
shutdown. Apart from that the change fixes: a logging problem, it adds error
handling in case of a timeout during _stop?shouldStopAtCheckpoint=true. Some
logic has been moved from the task to the indexer.

fixes #70416
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] TransformIT » testStopWaitForCheckpoint test fails
4 participants