Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An alternative approach to supporting union-types on stats grouping field #110600

Merged

Conversation

craigtaverner
Copy link
Contributor

The PR at #110476 fixed errors with cases where we have STATS grouping on a field with union-types, but it did so by disabling the ordinals aggregation code path (by blocking the removal of the missing field from FieldExtraction, which had the knockon effect of disabling ordinals aggregation). The new approach re-enables that code path, adds the union-types BlockLoader to the ordinals aggregation so that it is able to convert types for the grouping key, but also makes sure this Blockloader returns false to the supportsOrdinals method, since that will not be true for union-types. Without this false, two fields that are not identical before conversion will not be understood to be identical after conversion. We want, for example IP("192.168.0.1") to be identical to STRING("192.168.0.1") if the grouping key includes the conversion function (eg. STATS count(*) BY client_ip::ip).

Where the grouping field is erased by later commands, like a subsequent stats.
Instead we include union-type supports in the ordinals aggregation and mark the block loader as not supporting ordinals.
@craigtaverner craigtaverner added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Jul 8, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@craigtaverner craigtaverner changed the title An alternative approach to sopporting union-types on stats grouping field An alternative approach to supporting union-types on stats grouping field Jul 8, 2024
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems better. Maybe someone from the QL land can confirm.

@craigtaverner craigtaverner merged commit 362e049 into elastic:main Jul 9, 2024
15 checks passed
craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Jul 15, 2024
…ield (elastic#110600)

* Added union-types field extration to ordinals aggregation

* Revert previous approach to getting union-types working in aggregations

Where the grouping field is erased by later commands, like a subsequent stats.
Instead we include union-type supports in the ordinals aggregation and mark the block loader as not supporting ordinals.
craigtaverner added a commit that referenced this pull request Jul 16, 2024
* Fix bug in union-types with type-casting in grouping key of STATS (#110476)

* Allow auto-generated type-cast fields in CsvTests

This allows, for example, a csv-spec test result header like `client_ip::ip:ip`, which is generated with a command like `STATS count=count(*) BY client_ip::ip`

It is also a small cleanup of the header parsing code, since it was using Strings.split() in an odd way.

* Fix bug in union-types with type-casting in grouping key of STATS

* Update docs/changelog/110476.yaml

* Added casting_operator required capability

Using the new `::` syntax requires disabling support for older versions in multi-cluster tests.

* Added more tests for inline stats over long/datetime

* Trying to fix the STATS...STATS bug

This makes two changes:

* Keeps the Alias in the aggs.aggregates from the grouping key, so that ReplaceStatsNestedExpressionWithEval still works
* Adds explicit support for union-types conversion at grouping key loading in the ordinalGroupingOperatorFactory

Neither fix the particular edge case, but do seem correct

* Added EsqlCapability for this change

So that mixed cluster tests don't fail these new queries.

* Fix InsertFieldExtract for union types

Union types require a FieldExtractExec to be performed first thing at
the bottom of local physical plans.

In queries like
```
  from testidx*
  | eval x = to_string(client_ip)
  | stats c = count(*) by x
  | keep c
```
The `stats` has the grouping `x` but the aggregates get pruned to just
`c`. In cases like this, we did not insert a FieldExtractExec, which
this fixes.

* Revert query that previously failed

With Alex's fix, this query now passes.

* Revert integration of union-types to ordinals aggregator

This is because we have not found a test case that actually demonstrates this is necessary.

* More tests that would fail without the latest fix

* Correct code style

* Fix failing case when aggregating on union-type with invalid grouping key

* Capabilities restrictions on the new YML tests

* Update docs/changelog/110476.yaml

---------

Co-authored-by: Alexander Spies <[email protected]>

* An alternative approach to supporting union-types on stats grouping field (#110600)

* Added union-types field extration to ordinals aggregation

* Revert previous approach to getting union-types working in aggregations

Where the grouping field is erased by later commands, like a subsequent stats.
Instead we include union-type supports in the ordinals aggregation and mark the block loader as not supporting ordinals.

* Fix union-types when aggregating on inline conversion function (#110652)

A query like:

```
FROM sample_data, sample_data_str
| STATS count=count(*) BY client_ip = TO_IP(client_ip)
| SORT count DESC, client_ip ASC
| KEEP count, client_ip
```

Failed due to unresolved aggregates from the union-type in the grouping key

* Fix for union-types for multiple columns with the same name (#110793)

* Make union types use unique attribute names

* Cleanup leftover

* Added failing test and final fix to EsRelation

* Implement FieldAttribute.fieldName()

* Fix tests

* Refactor

* Do not ignore union typed field's parent

* Fix important typo

D'oh

* Mute unrelated (part of) test

* Move capability to better location

* Fix analyzer tests

* multi-node tests with an earlier version of union-types (before this change) fail

* Add capability to remaining failing tests

* Remove variable

* Add more complex test

* Consolidate union type cleanup rules

* Add 3 more required_capability's to make CI happy

* Update caps for union type subfield yaml tests

* Update docs/changelog/110793.yaml

* Refined changelog text

* Mute BWC for 8.15.0 for failing YAML tests

* union_types_remove_fields for all 160_union_types tests

The tests fail spordically, so safer to mute the entire suite.

---------

Co-authored-by: Alexander Spies <[email protected]>

---------

Co-authored-by: Alexander Spies <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants