-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Automatically map interval to fixed/calendar interval in datafeed aggs #51606
Comments
Pinging @elastic/ml-core (:ml) |
I think we can do this unconditionally regardless of minimum node version. If we have a mixed cluster version, the |
For wire serialization this is true, but the extra problem ML is adding is to store the aggregations in a document in an index that might be read by an older node in a mixed version cluster. Usually aggregation DSL is a transient thing that only exists for the duration of one search so the wire serialization mappings are enough. We have lenient parsing when loading configs from indices, so parsing the config on load will not cause an error. But the older node will then see a |
Isn't this already a problem then?
I thought this type of situation was why our NOTE: With the automatic re-write, this will probably be exacerbated. To your point. |
This is not a hard requirement. The docs just say:
Also, even if that advice is followed, when there are multiple master-eligible nodes the actual master node at some time during the process might be on a version ahead of some other master-eligible nodes.
There's an assumption that users don't use new features of a particular version until the whole cluster is on that version. I agree there could be problems if this assumption isn't followed. Basically what I am saying is that our internal upgrade code should follow that same assumption - don't start adding new syntax into configs until the whole cluster is on the version that supports that syntax. If a user got themselves into a mess by trying to use new version functionality in a mixed version cluster we could justifiably push back and say they should have waited until the whole cluster is upgraded, and that they need to complete their upgrade before our functionality will work again. But code we write that runs automatically breaks their mixed version cluster then they have a more justified grievance against us. |
Current plan:
As for updating existing datafeeds that are not in use, i am not 100% the best way to go about this. The options I can think of are:
|
…terval fields (#52538) `interval` was deprecated back in 7.2.0. It was replaced with `calendar_interval` and `fixed_interval`. This change automatically rewrites datafeed configurations that contain the old `interval` field. The rewrite occurs when the configuration is read from the index. It is NOT then written back to the index. This PR also re-enables the long disabled BWC tests for datafeeds. Partially addresses: #51606
This creates an auto update service. The first automatic update is rewriting datafeed aggregations if they exist. This is necessary as `date_histogram` is deprecating and removing the `interval` key. To aid users who have the `interval` key defined, we can automatically upgrade these aggregation definitions to the appropriate `fixed_interval` or `calendar_interval`. closes #51606
Since #33727 went into 7.2
interval
is deprecated in date histograms. New date histograms should usefixed_interval
orcalendar_interval
instead. In 8.x it is likely that date histograms containinginterval
will be considered invalid.ML datafeed configs are persisted and are likely to contain
interval
rather thanfixed_interval
orcalendar_interval
; they definitely will if they were first created in a version earlier than 7.2. Such datafeed configs will be invalid in 8.0, so prior to 8.0 we must do something about this, but there is no reason we cannot start sooner.The best course of action would seem to be to silently rewrite datafeed configs that contain a
date_histogram
aggregation with aninterval
setting to have whichever offixed_interval
orcalendar_interval
preserves the current behaviour.We can do this unconditionally providing the oldest node in the cluster is on version 7.2 or above.
Rollups already does this silent behaviour-preserving rewriting, although rollup configs are stored in cluster state which makes this considerably easier than for the ML configs that are stored in an index.
The code that does the translation for rollups is in:
elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/rollup/job/DateHistogramGroupConfig.java
Lines 65 to 85 in bd01cce
and:
elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/rollup/job/DateHistogramGroupConfig.java
Lines 167 to 174 in bd01cce
That should be considered as the spec for how to do a silent behaviour-preserving rewrite.
The extra complexity in ML is deciding where to do the migration for datafeed configs that wouldn't otherwise be rewritten. If we do it in the datafeed config parser then that ensures that newly submitted configs and datafeed configs that are updated will get translated. Then we would just need to find an appropriate place to do it for existing datafeed configs - maybe on master node election providing the oldest node in the cluster is version 7.2 or above?
There are also some currently muted tests that should be reenabled when the work is done:
elasticsearch/x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/datafeed/DatafeedConfigTests.java
Line 545 in be849b2
elasticsearch/x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/datafeed/DatafeedUpdateTests.java
Line 60 in be849b2
The text was updated successfully, but these errors were encountered: