Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of unmapped ValuesSources #53238

Closed
not-napoleon opened this issue Mar 6, 2020 · 3 comments
Closed

Improve handling of unmapped ValuesSources #53238

not-napoleon opened this issue Mar 6, 2020 · 3 comments
Labels
:Analytics/Aggregations Aggregations >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt

Comments

@not-napoleon
Copy link
Member

Relates to #42949

ValuesSourceConfig has a concept of "unmapped", which the user has specified a field name, but we are unable to resolve that to a mapped field (Note that this is different from the user not specifying a field at all, in which case a script is required or the query is invalid). This might happen, for example, if a user were running an aggregation over multiple indexes, and one didn't have the field. If the user supplied a missing value, VSConfig returns a ValuesSource which returns that missing value and the aggregation proceeds as normal. Otherwise, the aggregation is given an opportunity to optimize for the unmapped case via the createUnmapped method. See TermsAggregatorFactory for an example of where this is used.

There are a couple of problems to fix here. First and foremost, ValuesSourceConfig communicates the unmapped state back to the aggregation builder by returning a null values source. This ends up with a lot of null checking down stream, and the associated cognitive load and potential for missing a spot and triggering an NPE. Instead, VSConfig could expose an isUnmapped predicate, and always return a defined values source (We have static empty implementations that would be ideal here). The type for this values source would be determined by either the user's valueType hint or the default values source type, if the user didn't send a hint. This is the same way we decide the type for an unmapped field with a missing value now.

Additionally, aggregations that don't specialize for the unmapped case (e.g. Histogram) currently just hard code an aggregator to run. More logically, these should still respect the ValuesSourceType as resolved in ValuesSourceConfig. In many cases, createUnmapped could just call doCreateInternal with a ValuesSourceType and an empty (but not null) ValuesSource

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@jmbessa
Copy link

jmbessa commented Jun 10, 2021

Hi! I would like to know if this issue still open. If yes, can I work on it? This would be my first issue

@wchaparro
Copy link
Member

closing as dupe of #58139

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt
Projects
None yet
Development

No branches or pull requests

5 participants