-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Datafeed does not start when allow_lazy_open is enabled #53763
Comments
Pinging @elastic/ml-core (:ml) |
There are two bits to this. Obviously it's a bug that the datafeed doesn't start once resource is available and that's relatively easy to fix. Having the state being elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/MlTasks.java Line 135 in 8bbbe28
Lines 97 to 101 in 20d4bc8
So:
We should probably discuss offline whether introducing |
I have no desire to introduce To clarify scope and intention of this ticket - it is to ensure that jobs can lazy open. So, let's fix the bug. |
I think we might need to make I will make the change on Monday. |
It is possible for ML jobs to open lazily if the "allow_lazy_open" option in the job config is set to true. Such jobs wait in the "opening" state until a node has sufficient capacity to run them. This commit fixes the bug that prevented datafeeds for jobs lazily waiting assignment from being started. The state of such datafeeds is "starting", and they can be stopped by the stop datafeed API while in this state with or without force. Relates elastic#53763
It is possible for ML jobs to open lazily if the "allow_lazy_open" option in the job config is set to true. Such jobs wait in the "opening" state until a node has sufficient capacity to run them. This commit fixes the bug that prevented datafeeds for jobs lazily waiting assignment from being started. The state of such datafeeds is "starting", and they can be stopped by the stop datafeed API while in this state with or without force. Fixes #53763
Found in 7.7.0-SNAPSHOT
"build_hash" : "2f0aca992bb8c91c17603050807891cad2e41483", "build_date" : "2020-03-16T02:52:34.086738Z",
"xpack.ml.max_machine_memory_percent" : 16
I have a script that creates 16 jobs in succession. Each job requires 2GB model memory.
The first 3 jobs open and the datafeeds start.
The 4th job returns
opened:false
and the datafeed fails to start with the following:In the job list, the job state is
opening
and the datafeed state isstopped
. No errors are visible.As one of the first 3 jobs completes, one of the
opening
jobs transitions its state toopened
. However the datafeed remainsstopped
.These are the job messages for a job that was lazy opening.
Expected behavior would be for the datafeed to be
starting
and for it to start once resource became available (which would happen when one of the other jobs closed, in this scenario).Once jobs have completed, I can manually start the datafeed on one of the
opened
jobs and it will complete without on-screen errors. (I cannot start one of theopening
jobs, which is to be expected.)The text was updated successfully, but these errors were encountered: