Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ordering terms aggregation on top metrics null values #85774

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Apr 11, 2022

If top_metrics buckets are never collected (i.e. a filter never
passing data to a top_metrics nested aggregator) sorting results
in an index out of bounds exception. The data array, filled at
collection time, is actually empty and we try to access it to do
the sorting using ordinals as indices. To be more precise, we do the
comparison when filling the priority queue used to build aggregation
results. The first insertion is successful since the priority queue is empty
and there is no comparison to perform. Anyway, as soon as we insert
the second element in the priority queue, the comparator is invoked
and access to the empty array takes place.

Here we use the special Double.NaN value to report missing data
for the top_metrics metric field. This happens for empty or null
fields. The comparator we use is a 'NaN-aware' comparator.
As a consequence, data is sorted using another comparator,
the default '_key' comparator added when creating the compound
comparator.

This is a fix for #85127.

If top metrics buckets are never collected (i.e. a filter never
passing data to a top metrics aggregator) ordering results in an
index out of bounds exception. THis is because the data array
expected to include data is actually empty.

Here we use the special Double.NaN value to report missing data
to sort on for the top metrics metric field. This results in the
comparator ordering data using another comparator (the default
'_key' comprator).
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 11, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@salvatore-campagna salvatore-campagna added >bug and removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Apr 11, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @salvatore-campagna, I've created a changelog YAML for you.

@salvatore-campagna salvatore-campagna added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 11, 2022
- match: { aggregations.name.buckets.4.field_exists.top_metrics.top.0.sort.0: "2021-11-09T17:29:00.000Z" }
- match: { aggregations.name.buckets.4.field_exists.top_metrics.top.0.metrics.version: 1 }

---
Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last two tests are here just to highlight that the behaviour of the top_metric aggregation changes with the value of rewrite_to_filter_by_filter. This is not correct, we would like to have consistent behaviour. I will submit an issue to track this and point the issue to these two tests (and their counterpart where the setting is true). I am not going to fix this issue in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create the issue after merging this PR so that I can reference the two tests in the ticket.

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label Apr 11, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

terms:
field: name
order:
"field_exists>top_metrics[unknown_metric]": asc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change this to actually use the unknown_metric in the top_metrics aggregations rather than the filter.

terms:
field: name
order:
"field_exists>top_metrics[unknown_metric]": asc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@salvatore-campagna salvatore-campagna merged commit 3b78a4c into elastic:master Apr 19, 2022
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Apr 21, 2022
* master: (104 commits)
  fix: ordering terms aggregation on top metrics null values (elastic#85774)
  Fix up whitespace error introduced in elastic#85948
  More docs re. removing cluster.initial_master_nodes (elastic#85948)
  [Test] Remove API key methods from HLRC (elastic#85802)
  Remove references to bootstrap.system_call_filter (elastic#85964)
  Move docker cgroup override to SystemJvmOptions (elastic#85960)
  Add connection accounting tests (elastic#85966)
  Remove MacOS from platform support testing matrix
  Remove custom KnnVectorFieldExistsQuery (elastic#85945)
  Relax data path deprecations from critical to warn (elastic#85952)
  Remove hppc from some "common" classes (elastic#85957)
  Move docker env var settings handling out of bash (elastic#85913)
  Remove hppc from task manager (elastic#85889)
  [ML] rename trained model allocations to assignments (elastic#85503)
  Remove hppc from multi*shard request and responses (elastic#85888)
  Consolidating logging initialization in cli launcher (elastic#85920)
  Convert license tools to use unified cli entrypoint (elastic#85919)
  Add noop detection to node shutdown actions (elastic#85914)
  Adjust SQL expended test output
  TSDB: Add timestamp provider to AggregationExecutionContext (elastic#85850)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/aggregations/AggregationExecutionContext.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Clients Meta label for clients team v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants