-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An alternative approach to supporting union-types on stats grouping field #110600
Merged
craigtaverner
merged 2 commits into
elastic:main
from
craigtaverner:union_types_alt_ordinals_agg
Jul 9, 2024
Merged
An alternative approach to supporting union-types on stats grouping field #110600
craigtaverner
merged 2 commits into
elastic:main
from
craigtaverner:union_types_alt_ordinals_agg
Jul 9, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Where the grouping field is erased by later commands, like a subsequent stats. Instead we include union-type supports in the ordinals aggregation and mark the block loader as not supporting ordinals.
craigtaverner
added
>non-issue
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
:Analytics/ES|QL
AKA ESQL
labels
Jul 8, 2024
Pinging @elastic/es-analytical-engine (Team:Analytics) |
craigtaverner
changed the title
An alternative approach to sopporting union-types on stats grouping field
An alternative approach to supporting union-types on stats grouping field
Jul 8, 2024
nik9000
approved these changes
Jul 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems better. Maybe someone from the QL land can confirm.
alex-spies
approved these changes
Jul 8, 2024
craigtaverner
added a commit
to craigtaverner/elasticsearch
that referenced
this pull request
Jul 15, 2024
…ield (elastic#110600) * Added union-types field extration to ordinals aggregation * Revert previous approach to getting union-types working in aggregations Where the grouping field is erased by later commands, like a subsequent stats. Instead we include union-type supports in the ordinals aggregation and mark the block loader as not supporting ordinals.
craigtaverner
added a commit
that referenced
this pull request
Jul 16, 2024
* Fix bug in union-types with type-casting in grouping key of STATS (#110476) * Allow auto-generated type-cast fields in CsvTests This allows, for example, a csv-spec test result header like `client_ip::ip:ip`, which is generated with a command like `STATS count=count(*) BY client_ip::ip` It is also a small cleanup of the header parsing code, since it was using Strings.split() in an odd way. * Fix bug in union-types with type-casting in grouping key of STATS * Update docs/changelog/110476.yaml * Added casting_operator required capability Using the new `::` syntax requires disabling support for older versions in multi-cluster tests. * Added more tests for inline stats over long/datetime * Trying to fix the STATS...STATS bug This makes two changes: * Keeps the Alias in the aggs.aggregates from the grouping key, so that ReplaceStatsNestedExpressionWithEval still works * Adds explicit support for union-types conversion at grouping key loading in the ordinalGroupingOperatorFactory Neither fix the particular edge case, but do seem correct * Added EsqlCapability for this change So that mixed cluster tests don't fail these new queries. * Fix InsertFieldExtract for union types Union types require a FieldExtractExec to be performed first thing at the bottom of local physical plans. In queries like ``` from testidx* | eval x = to_string(client_ip) | stats c = count(*) by x | keep c ``` The `stats` has the grouping `x` but the aggregates get pruned to just `c`. In cases like this, we did not insert a FieldExtractExec, which this fixes. * Revert query that previously failed With Alex's fix, this query now passes. * Revert integration of union-types to ordinals aggregator This is because we have not found a test case that actually demonstrates this is necessary. * More tests that would fail without the latest fix * Correct code style * Fix failing case when aggregating on union-type with invalid grouping key * Capabilities restrictions on the new YML tests * Update docs/changelog/110476.yaml --------- Co-authored-by: Alexander Spies <[email protected]> * An alternative approach to supporting union-types on stats grouping field (#110600) * Added union-types field extration to ordinals aggregation * Revert previous approach to getting union-types working in aggregations Where the grouping field is erased by later commands, like a subsequent stats. Instead we include union-type supports in the ordinals aggregation and mark the block loader as not supporting ordinals. * Fix union-types when aggregating on inline conversion function (#110652) A query like: ``` FROM sample_data, sample_data_str | STATS count=count(*) BY client_ip = TO_IP(client_ip) | SORT count DESC, client_ip ASC | KEEP count, client_ip ``` Failed due to unresolved aggregates from the union-type in the grouping key * Fix for union-types for multiple columns with the same name (#110793) * Make union types use unique attribute names * Cleanup leftover * Added failing test and final fix to EsRelation * Implement FieldAttribute.fieldName() * Fix tests * Refactor * Do not ignore union typed field's parent * Fix important typo D'oh * Mute unrelated (part of) test * Move capability to better location * Fix analyzer tests * multi-node tests with an earlier version of union-types (before this change) fail * Add capability to remaining failing tests * Remove variable * Add more complex test * Consolidate union type cleanup rules * Add 3 more required_capability's to make CI happy * Update caps for union type subfield yaml tests * Update docs/changelog/110793.yaml * Refined changelog text * Mute BWC for 8.15.0 for failing YAML tests * union_types_remove_fields for all 160_union_types tests The tests fail spordically, so safer to mute the entire suite. --------- Co-authored-by: Alexander Spies <[email protected]> --------- Co-authored-by: Alexander Spies <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Analytics/ES|QL
AKA ESQL
>non-issue
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
v8.16.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The PR at #110476 fixed errors with cases where we have STATS grouping on a field with union-types, but it did so by disabling the ordinals aggregation code path (by blocking the removal of the missing field from FieldExtraction, which had the knockon effect of disabling ordinals aggregation). The new approach re-enables that code path, adds the union-types BlockLoader to the ordinals aggregation so that it is able to convert types for the grouping key, but also makes sure this Blockloader returns
false
to thesupportsOrdinals
method, since that will not be true for union-types. Without thisfalse
, two fields that are not identical before conversion will not be understood to be identical after conversion. We want, for exampleIP("192.168.0.1")
to be identical toSTRING("192.168.0.1")
if the grouping key includes the conversion function (eg.STATS count(*) BY client_ip::ip
).