-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make GetTrainedModelsStatsAction cancellable #87931
Comments
Pinging @elastic/ml-core (Team:ML) |
Based on the Slack thread that went with this I think it's actually |
Hm. Sorry, I think I made a mistake translating what I saw in the thread dump to the REST action I mentioned in Slack. The threads in question were here:
I'm not sure that this is due to Metricbeat now. Still, +1 on adding support for cancellation to all the things. |
OK, I saw But it looks like it really is @benwtrent as well as making the action cancellable it looks like we need a more efficient way to list which ingest pipelines a particular trained model is used in. It looks like it's instantiating all the processors at the moment, then looking for inference processors, taking the ID and throwing everything else away. But instantiating a Grok processor seems to be very expensive. So maybe we need a specialist method at a lower level that can search for inference processors in an ingest pipeline without instantiating every other processor. |
Arg, when I first wrote this code, I was extracting from opaque maps. Then I figured "Ya know, we could build the ingest pipelines and just extract from there and not iterate maps...". Welp, Time to go back and find that old code... 🤦 |
This change makes all the trained model APIs cancellable, and addresses the handful of APIs that rely on our abstract resource structure. closes: #87931
Previous, get trained model stats API would build every pipeline defined in cluster state. This is problematic when MANY pipelines are defined. Especially if those pipelines take some time to parse (consider GROK). This improvement is part of fixing: #87931
Previous, get trained model stats API would build every pipeline defined in cluster state. This is problematic when MANY pipelines are defined. Especially if those pipelines take some time to parse (consider GROK). This improvement is part of fixing: elastic#87931
Previous, get trained model stats API would build every pipeline defined in cluster state. This is problematic when MANY pipelines are defined. Especially if those pipelines take some time to parse (consider GROK). This improvement is part of fixing: #87931
Previous, get trained model stats API would build every pipeline defined in cluster state. This is problematic when MANY pipelines are defined. Especially if those pipelines take some time to parse (consider GROK). This improvement is part of fixing: elastic#87931
Previous, get trained model stats API would build every pipeline defined in cluster state. This is problematic when MANY pipelines are defined. Especially if those pipelines take some time to parse (consider GROK). This improvement is part of fixing: #87931
I encountered a Cloud cluster with an overworked master due (partly) to processing multiple calls to
GET /_ml/anomaly_detectors/_all/_stats
originating from an external Metricbeat monitoring process. Metricbeat imposes a 10s timeout after which it closes the HTTP connection and tries again. However,GetTrainedModelsStatsAction
does not notice if the client connection closes (i.e. the REST handler does not useRestCancellableNodeClient
and the resulting transport task is not aCancellableTask
) so it carries on wastefully processing the request even after the client timeout.Relates #55550
The text was updated successfully, but these errors were encountered: