-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Transform] scripted group_by fails for continuos transform #57332
Comments
Pinging @elastic/ml-core (:ml/Transform) |
) disable optimizations when using scripts in group_by, when scripts using scripts we can not predict the outcome and we have no query counterpart. Other optimizations for other group_by's are not affected. fixes #57332
) disable optimizations when using scripts in group_by, when scripts using scripts we can not predict the outcome and we have no query counterpart. Other optimizations for other group_by's are not affected. fixes #57332
leftover from a previous solution of elastic#57332, which has been discarded relates elastic#60724 fixes elastic#60794
FWIW: I tested the possibility to backport this change to Without leaking any information about release dates, the next patch release I therefore dropped the idea of a backport. |
After revisiting the backport again, I manually backported the important bits to |
Affected versions: 7.7 - 7.9.1 (fixed >= 7.9.2)
Continuous transform minimize the amount of updates by querying only changed buckets, the logic uses a combination of query features for that. For terms it relies on terms query.
If a script is used in
group_by
instead of a field, this change detection logic causes the transform to fail (after it switches to continuous mode, not directly after start but after checkpoint 1) as it expects a field:Because scripts offer freedom to build a bucket key, we can't use them to detect changes on them. We could construct a script query, however this would be very expensive.
Mitigation: don't use scripts in
group_by
together with continuous transform. Scripts in queries are very expensive, so independent of change detection it's highly recommended to not use scripts in production, but only in the development/data exploration phase.Possible solutions
Update: We finally decided to go with option B, keeping the other ideas for documentation.
## A Disallow scripted group_by in continuous modeThis would be easiest, however if you have only 1 scripted group_by butn
non scriptedgroup_by
's this would limit functionality unnecessary.B Disable change detection for scripted group_by
This would let continuous mode do a full rerun if all
group_by
's are using a script. On larger scale this leads to performance problems. If there are other non-scriptedgroup_by
's or the amount of data is small this might still be an acceptable solution.B.2: For this solution we could also consider using_update
instead of index.## C Implement change detection based on scripted queryI am not 100% sure this is possible. This solution would use a script query instead of a terms query. This solution might not be better than solution B: disabling change detection.General solution remarks
Because data might be small A would limit functionality, I therefore tend towards B. It would be possible to disallow certain combinations like only disallow if no
group_by
implements change detection, but again this sounds like to much of a restriction. Instead we should warn the user about potential problems. Solution C might be good in the long run, but takes significant more time to verify, implement and test, so B seems to be the best short term solution.The user that reported the issue made a workaround for #48243, therefore supporting missing_bucket should be prioritized.
Update
If transform can not apply any change detection a warning (job message + log) is raised regarding performance.
The text was updated successfully, but these errors were encountered: