-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Fix possible race condition when starting datafeed #51302
[ML] Fix possible race condition when starting datafeed #51302
Conversation
The ID of the datafeed's associated job was being obtained frequently by looking up the datafeed task in a map that was being modified in other threads. This could lead to NPEs if the datafeed stopped running at an unexpected time. This change reduces the number of places where a datafeed's associated job ID is looked up to avoid the possibility of failures when the datafeed's task is removed from the map of running tasks during multi-step operations in other threads. Fixes elastic#51285
Pinging @elastic/ml-core (:ml) |
I think #49467 may be caused by the same race |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I would have spotted in NPE in the logs when triaging #49467. That test timed out waiting for the datafeed to stop, my theory is the NPE took down the worker thread that is vital in stopping the DF so a stopped status was never reached. |
The ID of the datafeed's associated job was being obtained frequently by looking up the datafeed task in a map that was being modified in other threads. This could lead to NPEs if the datafeed stopped running at an unexpected time. This change reduces the number of places where a datafeed's associated job ID is looked up to avoid the possibility of failures when the datafeed's task is removed from the map of running tasks during multi-step operations in other threads. Fixes #51285
Datafeeds being closed while starting could result in and NPE. This was handled as any other failure, masking out the NPE. However, this conflicts with the changes in elastic#50886. Related to elastic#50886 and elastic#51302
The ID of the datafeed's associated job was being obtained
frequently by looking up the datafeed task in a map that
was being modified in other threads. This could lead to
NPEs if the datafeed stopped running at an unexpected time.
This change reduces the number of places where a datafeed's
associated job ID is looked up to avoid the possibility of
failures when the datafeed's task is removed from the map
of running tasks during multi-step operations in other
threads.
Fixes #51285