Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix possible race condition starting datafeed #51646

Merged

Conversation

henningandersen
Copy link
Contributor

Datafeeds being closed while starting could result in and NPE. This was
handled as any other failure, masking out the NPE. However, this
conflicts with the changes in #50886.

Related to #50886 and #51302

Datafeeds being closed while starting could result in and NPE. This was
handled as any other failure, masking out the NPE. However, this
conflicts with the changes in elastic#50886.

Related to elastic#50886 and elastic#51302
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@@ -520,7 +520,12 @@ private void runTask(TransportStartDatafeedAction.DatafeedTask task) {
// a context with sufficient permissions would coincidentally be in force in some single node
// tests, leading to bugs not caught in CI due to many tests running in single node test clusters.
try (ThreadContext.StoredContext ignore = threadPool.getThreadContext().stashContext()) {
innerRun(runningDatafeedsOnThisNode.get(task.getAllocationId()), task.getDatafeedStartTime(), task.getEndTime());
Holder holder = runningDatafeedsOnThisNode.get(task.getAllocationId());
Copy link
Contributor Author

@henningandersen henningandersen Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting in a sleep(100) before this line provokes the NPE.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if you could just change two words.

Thanks for fixing this. Maybe it will help with occasional weird failures we get during test cleanup.

if (holder != null) {
innerRun(holder, task.getDatafeedStartTime(), task.getEndTime());
} else {
logger.warn("Datafeed [{}] was closed while being opened", task.getDatafeedId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the terms “started” and “stopped” with datafeeds instead of “opened” and “closed”, so please could you change those two words in this message.

@henningandersen
Copy link
Contributor Author

Thanks David.

@henningandersen henningandersen merged commit f891a0d into elastic:master Jan 30, 2020
@henningandersen
Copy link
Contributor Author

The test that failed in CI was MachineLearningLicensingTests.testAutoCloseJobWithDatafeed.

henningandersen added a commit that referenced this pull request Jan 30, 2020
Datafeeds being closed while starting could result in and NPE. This was
handled as any other failure, masking out the NPE. However, this
conflicts with the changes in #50886.

Related to #50886 and #51302
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants