-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ordering terms aggregation on top metrics null values #85774
fix: ordering terms aggregation on top metrics null values #85774
Conversation
If top metrics buckets are never collected (i.e. a filter never passing data to a top metrics aggregator) ordering results in an index out of bounds exception. THis is because the data array expected to include data is actually empty. Here we use the special Double.NaN value to report missing data to sort on for the top metrics metric field. This results in the comparator ordering data using another comparator (the default '_key' comprator).
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Hi @salvatore-campagna, I've created a changelog YAML for you. |
- match: { aggregations.name.buckets.4.field_exists.top_metrics.top.0.sort.0: "2021-11-09T17:29:00.000Z" } | ||
- match: { aggregations.name.buckets.4.field_exists.top_metrics.top.0.metrics.version: 1 } | ||
|
||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last two tests are here just to highlight that the behaviour of the top_metric aggregation changes with the value of rewrite_to_filter_by_filter
. This is not correct, we would like to have consistent behaviour. I will submit an issue to track this and point the issue to these two tests (and their counterpart where the setting is true). I am not going to fix this issue in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create the issue after merging this PR so that I can reference the two tests in the ticket.
@elasticmachine update branch |
Pinging @elastic/clients-team (Team:Clients) |
terms: | ||
field: name | ||
order: | ||
"field_exists>top_metrics[unknown_metric]": asc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change this to actually use the unknown_metric
in the top_metrics aggregations rather than the filter.
terms: | ||
field: name | ||
order: | ||
"field_exists>top_metrics[unknown_metric]": asc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@elasticmachine update branch |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: (104 commits) fix: ordering terms aggregation on top metrics null values (elastic#85774) Fix up whitespace error introduced in elastic#85948 More docs re. removing cluster.initial_master_nodes (elastic#85948) [Test] Remove API key methods from HLRC (elastic#85802) Remove references to bootstrap.system_call_filter (elastic#85964) Move docker cgroup override to SystemJvmOptions (elastic#85960) Add connection accounting tests (elastic#85966) Remove MacOS from platform support testing matrix Remove custom KnnVectorFieldExistsQuery (elastic#85945) Relax data path deprecations from critical to warn (elastic#85952) Remove hppc from some "common" classes (elastic#85957) Move docker env var settings handling out of bash (elastic#85913) Remove hppc from task manager (elastic#85889) [ML] rename trained model allocations to assignments (elastic#85503) Remove hppc from multi*shard request and responses (elastic#85888) Consolidating logging initialization in cli launcher (elastic#85920) Convert license tools to use unified cli entrypoint (elastic#85919) Add noop detection to node shutdown actions (elastic#85914) Adjust SQL expended test output TSDB: Add timestamp provider to AggregationExecutionContext (elastic#85850) ... # Conflicts: # server/src/main/java/org/elasticsearch/search/aggregations/AggregationExecutionContext.java
If top_metrics buckets are never collected (i.e. a filter never
passing data to a top_metrics nested aggregator) sorting results
in an index out of bounds exception. The data array, filled at
collection time, is actually empty and we try to access it to do
the sorting using ordinals as indices. To be more precise, we do the
comparison when filling the priority queue used to build aggregation
results. The first insertion is successful since the priority queue is empty
and there is no comparison to perform. Anyway, as soon as we insert
the second element in the priority queue, the comparator is invoked
and access to the empty array takes place.
Here we use the special Double.NaN value to report missing data
for the top_metrics metric field. This happens for empty or null
fields. The comparator we use is a 'NaN-aware' comparator.
As a consequence, data is sorted using another comparator,
the default '_key' comparator added when creating the compound
comparator.
This is a fix for #85127.