-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL: fix stats by constant expresson with alias #117551
ES|QL: fix stats by constant expresson with alias #117551
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @luigidellaquila, I've created a changelog YAML for you. |
@@ -511,7 +511,7 @@ static Set<String> fieldNames(LogicalPlan parsed, Set<String> enrichPolicyMatchF | |||
// remove any already discovered UnresolvedAttributes that are in fact aliases defined later down in the tree | |||
// for example "from test | eval x = salary | stats max = max(x) by gender" | |||
// remove the UnresolvedAttribute "x", since that is an Alias defined in "eval" | |||
AttributeSet planRefs = Expressions.references(p.expressions()); | |||
AttributeSet planRefs = p.references(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plans have logic to correctly calculate their references, no need to reinvent the wheel here.
In general, IMHO this method is trying to do too much.
The logic should be simple: each plan should declare needed inputs and produced outputs; this method should just start from the top of the plan, visit it down and for each plan node
- remove the outputs
- add the required inputs.
All the instanceof
above should be encapsulated in the single logical plan classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, yes. I was about to suggest a refactor of fieldNames
to make things simpler and more local, similarly to what you described.
However, since we're at a point here where references are not yet resolved, we cannot use the existing query plan methods as we would wish. Esp. the plan's output
and outputSet
methods do not work here because these require resolved references, and also need to know about what fields are even present in relations. This, and the fact that we need to match references by name rather than by id. This prevents us from just using the simple approach from ProjectAwayColumns.
However, using the references
method should be okay at this point because that's something that can be determined locally for a plan node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix. It turns out, it's the same problem with accidentally accounting for groupings in the Aggregate
's references, but in a different place because we use a manual way to determine a plan's output in fieldNames
.
@@ -511,7 +511,7 @@ static Set<String> fieldNames(LogicalPlan parsed, Set<String> enrichPolicyMatchF | |||
// remove any already discovered UnresolvedAttributes that are in fact aliases defined later down in the tree | |||
// for example "from test | eval x = salary | stats max = max(x) by gender" | |||
// remove the UnresolvedAttribute "x", since that is an Alias defined in "eval" | |||
AttributeSet planRefs = Expressions.references(p.expressions()); | |||
AttributeSet planRefs = p.references(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, yes. I was about to suggest a refactor of fieldNames
to make things simpler and more local, similarly to what you described.
However, since we're at a point here where references are not yet resolved, we cannot use the existing query plan methods as we would wish. Esp. the plan's output
and outputSet
methods do not work here because these require resolved references, and also need to know about what fields are even present in relations. This, and the fact that we need to match references by name rather than by id. This prevents us from just using the simple approach from ProjectAwayColumns.
However, using the references
method should be okay at this point because that's something that can be determined locally for a plan node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a second look at this issue @luigidellaquila.
The big advantage of fieldNames
is that it can be unit tested. Please, add some tests (be creative with queries that can break the logic in fieldNames
) in IndexResolverFieldNamesTests
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Thank you for adding the unit tests, @luigidellaquila
LGTM
Thanks for the reviews @astefan @alex-spies ! |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Backporting manually |
Fix planning of queries with STATS and grouping aliases
x
is now excluded from the references list (it's a generated value), so that the index resolution happens correctly.Fixes: #114714