-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Changes default destination index field mapping and adds scripted_metric agg #40750
[ML] Changes default destination index field mapping and adds scripted_metric agg #40750
Conversation
…script_metric agg
Pinging @elastic/ml-core |
run elasticsearch-ci/bwc |
Nice addition.
I now get what you mean: this is if deduction fails, sounds good to me. The field mapping is defined in |
@@ -29,7 +29,8 @@ private Aggregations() {} | |||
VALUE_COUNT("value_count", "long"), | |||
MAX("max", null), | |||
MIN("min", null), | |||
SUM("sum", null); | |||
SUM("sum", null), | |||
SCRIPTED_METRIC("scripted_metric", null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1st argument is the name known by aggregations, the 2nd the output type, which is either the concrete type or null
if the source type should be used. So null
is a magic variable and actually it should not be null for scripted_magic
, but that's due to the lack of another magic.
null
is a bad magic anyway and the types are well defined and limited. So I suggest to change this to:
enum AggregationType {
AVG("avg", "double"),
CARDINALITY("cardinality", "long"),
VALUE_COUNT("value_count", "long"),
MAX("max", "source"),
MIN("min", "source"),
SUM("sum", "source"),
SCRIPTED_METRIC("scripted_metric", "dynamic");
There should never be a type source
and dynamic
, alternatively we could use _source
, _dynamic
to make the special meaning more visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method resolveTargetMapping
needs to be changed, it should return the concrete source type or dynamic
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice, for sure! Let me crank that out today. It would be better than implicitly having different behavior for null
between max
and scripted_metric
.
@@ -75,6 +76,8 @@ public static void deduceMappings(final Client client, | |||
ValuesSourceAggregationBuilder<?, ?> valueSourceAggregation = (ValuesSourceAggregationBuilder<?, ?>) agg; | |||
aggregationSourceFieldNames.put(valueSourceAggregation.getName(), valueSourceAggregation.field()); | |||
aggregationTypes.put(valueSourceAggregation.getName(), valueSourceAggregation.getType()); | |||
} else if(agg instanceof ScriptedMetricAggregationBuilder) { | |||
// do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to put the fieldname/mapping (dynamic
) but only aggregationTypes and not aggregationSourceFieldNames
(I am not sure what happens further down, might require additional checks)
@@ -134,8 +137,7 @@ public static void getDestinationFieldMappings(final Client client, | |||
if (destinationMapping != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs an additional else if
to check for dynamic
, would be good to do a logger.info
for this case
@hendrikmuhs sorry for the miscommunication. Anything that is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…d_metric agg (elastic#40750) * [ML] Allowing destination index mappings to have dynamic types, adds script_metric agg * Making dynamic|source mapping explicit
* elastic/master: (25 commits) Rollup ignores time_zone on date histogram (elastic#40844) HLRC: fix uri encode bug when url path starts with '/' (elastic#34436) Mutes GatewayIndexStateIT.testRecoverBrokenIndexMetadata Docs: Pin two IDs in the rest client (elastic#40785) Adds version 6.7.2 [DOCS] Remind users to include @ symbol when applying license file (elastic#40688) HLRC: Convert xpack methods to client side objects (elastic#40705) Allow ILM to stop if indices have nonexistent policies (elastic#40820) Add an OpenID Connect authentication realm (elastic#40674) [DOCS] Rewrite query def (elastic#40746) [ML] Changes default destination index field mapping and adds scripted_metric agg (elastic#40750) Docs: Drop inline callouts from painless book (elastic#40805) [ML] refactoring start task a bit, removing unused code (elastic#40798) [DOCS] Document index.load_fixed_bitset_filters_eagerly (elastic#40780) Make remote cluster resolution stricter (elastic#40419) [ML] Add created_by info to usage stats (elastic#40518) SQL: [Docs] Small fixes for CURRENT_TIMESTAMP docs (elastic#40792) convert modules to use testclusters (elastic#40804) Replace javax activation with jakarta activation (elastic#40247) Document restrictions on fuzzy matching when using synonyms (elastic#40783) ...
…d_metric agg (elastic#40750) * [ML] Allowing destination index mappings to have dynamic types, adds script_metric agg * Making dynamic|source mapping explicit
This PR adjusts the default behavior for aggregations whose destination mapping cannot be accurately determined either through agg type or source field mapping. Previously, setting the field mapping to
double
was the default behavior. This will not work for more complex scenarios. So, instead of enforcing an explicit mapping for indeterminable fields, I allow the mapping to dynamically determine the appropriate type.The reason for adding
scripted_metric
in this PR as well, is so we have an aggregation that will exercise the new default behavior code path.I did NOT attempt to determine the explicit mapping for
scripted_metric
and simply let it always be dynamic. The reasoning for this is that we would simply be doing what the ES mapping system already does for us. It would add complexity with little benefit.Also, this simply adds the support for the aggregation. This PR does NOT include any new abstractions, interfaces, fields, etc. in the API. If we want to obfuscate the complexities in the
scripted_metric
aggregation, that will have to be future work.