ES|QL: fix stats by constant expresson with alias #117551

luigidellaquila · 2024-11-26T11:09:35Z

Fix planning of queries with STATS and grouping aliases

FROM test
| EVAL y = \"a\"
| STATS COUNT() BY x=y
| SORT x

x is now excluded from the references list (it's a generated value), so that the index resolution happens correctly.

Fixes: #114714

elasticsearchmachine · 2024-11-26T11:10:38Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-11-26T11:10:38Z

Hi @luigidellaquila, I've created a changelog YAML for you.

luigidellaquila · 2024-11-26T11:15:08Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

@@ -511,7 +511,7 @@ static Set<String> fieldNames(LogicalPlan parsed, Set<String> enrichPolicyMatchF
            // remove any already discovered UnresolvedAttributes that are in fact aliases defined later down in the tree
            // for example "from test | eval x = salary | stats max = max(x) by gender"
            // remove the UnresolvedAttribute "x", since that is an Alias defined in "eval"
-            AttributeSet planRefs = Expressions.references(p.expressions());
+            AttributeSet planRefs = p.references();


Plans have logic to correctly calculate their references, no need to reinvent the wheel here.

In general, IMHO this method is trying to do too much.
The logic should be simple: each plan should declare needed inputs and produced outputs; this method should just start from the top of the plan, visit it down and for each plan node

remove the outputs

add the required inputs.

All the instanceof above should be encapsulated in the single logical plan classes.

Ideally, yes. I was about to suggest a refactor of fieldNames to make things simpler and more local, similarly to what you described.

However, since we're at a point here where references are not yet resolved, we cannot use the existing query plan methods as we would wish. Esp. the plan's output and outputSet methods do not work here because these require resolved references, and also need to know about what fields are even present in relations. This, and the fact that we need to match references by name rather than by id. This prevents us from just using the simple approach from ProjectAwayColumns.

However, using the references method should be okay at this point because that's something that can be determined locally for a plan node.

alex-spies

Nice fix. It turns out, it's the same problem with accidentally accounting for groupings in the Aggregate's references, but in a different place because we use a manual way to determine a plan's output in fieldNames.

alex-spies · 2024-11-26T12:29:08Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

@@ -511,7 +511,7 @@ static Set<String> fieldNames(LogicalPlan parsed, Set<String> enrichPolicyMatchF
            // remove any already discovered UnresolvedAttributes that are in fact aliases defined later down in the tree
            // for example "from test | eval x = salary | stats max = max(x) by gender"
            // remove the UnresolvedAttribute "x", since that is an Alias defined in "eval"
-            AttributeSet planRefs = Expressions.references(p.expressions());
+            AttributeSet planRefs = p.references();


Ideally, yes. I was about to suggest a refactor of fieldNames to make things simpler and more local, similarly to what you described.

However, since we're at a point here where references are not yet resolved, we cannot use the existing query plan methods as we would wish. Esp. the plan's output and outputSet methods do not work here because these require resolved references, and also need to know about what fields are even present in relations. This, and the fact that we need to match references by name rather than by id. This prevents us from just using the simple approach from ProjectAwayColumns.

However, using the references method should be okay at this point because that's something that can be determined locally for a plan node.

astefan

Thanks for having a second look at this issue @luigidellaquila.
The big advantage of fieldNames is that it can be unit tested. Please, add some tests (be creative with queries that can break the logic in fieldNames) in IndexResolverFieldNamesTests.

astefan

Awesome. Thank you for adding the unit tests, @luigidellaquila
LGTM

luigidellaquila · 2024-11-26T15:56:01Z

Thanks for the reviews @astefan @alex-spies !

elasticsearchmachine · 2024-11-26T16:47:56Z

💔 Backport failed

Status	Branch	Result
❌	8.17	Commit could not be cherrypicked due to conflicts
❌	8.16	Commit could not be cherrypicked due to conflicts
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 117551

luigidellaquila · 2024-11-26T16:49:13Z

Backporting manually

ES|QL: fix stats by constant expresson with alias

c96c701

luigidellaquila added >bug auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.17.0 v8.16.2 v8.18.0 labels Nov 26, 2024

luigidellaquila requested review from ivancea and alex-spies November 26, 2024 11:09

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 26, 2024

Update docs/changelog/117551.yaml

7d536f7

luigidellaquila commented Nov 26, 2024

View reviewed changes

Merge branch 'main' into esql/fix_114714_take_2

c2c886e

alex-spies approved these changes Nov 26, 2024

View reviewed changes

astefan requested changes Nov 26, 2024

View reviewed changes

Unit tests

001555f

astefan approved these changes Nov 26, 2024

View reviewed changes

luigidellaquila enabled auto-merge (squash) November 26, 2024 15:57

luigidellaquila merged commit b22d185 into elastic:main Nov 26, 2024
16 checks passed

elasticsearchmachine added the backport pending label Nov 26, 2024

luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this pull request Nov 27, 2024

ES|QL: fix stats by constant expresson with alias (elastic#117551)

1394107

luigidellaquila mentioned this pull request Nov 27, 2024

ES|QL: fix stats by constant expresson with alias (#117551) #117612

Merged

luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this pull request Nov 27, 2024

ES|QL: fix stats by constant expresson with alias (elastic#117551)

bc2939e

luigidellaquila mentioned this pull request Nov 27, 2024

ES|QL: fix stats by constant expresson with alias (#117551) #117613

Merged

elasticsearchmachine pushed a commit that referenced this pull request Nov 27, 2024

ES|QL: fix stats by constant expresson with alias (#117551) (#117612)

83153e8

elasticsearchmachine pushed a commit that referenced this pull request Nov 27, 2024

ES|QL: fix stats by constant expresson with alias (#117551) (#117613)

f8811a1

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Nov 27, 2024

ES|QL: fix stats by constant expresson with alias (elastic#117551)

35da7e7

alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024

ES|QL: fix stats by constant expresson with alias (elastic#117551)

6d03d69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES|QL: fix stats by constant expresson with alias #117551

ES|QL: fix stats by constant expresson with alias #117551

luigidellaquila commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

luigidellaquila Nov 26, 2024

alex-spies Nov 26, 2024

alex-spies left a comment

alex-spies Nov 26, 2024

astefan left a comment

astefan left a comment

luigidellaquila commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

luigidellaquila commented Nov 26, 2024

ES|QL: fix stats by constant expresson with alias #117551

ES|QL: fix stats by constant expresson with alias #117551

Conversation

luigidellaquila commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

luigidellaquila Nov 26, 2024

Choose a reason for hiding this comment

alex-spies Nov 26, 2024

Choose a reason for hiding this comment

alex-spies left a comment

Choose a reason for hiding this comment

alex-spies Nov 26, 2024

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

luigidellaquila commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

💔 Backport failed

luigidellaquila commented Nov 26, 2024