-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Validate unique plan attribute names #110488
ESQL: Validate unique plan attribute names #110488
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @alex-spies, I've created a changelog YAML for you. |
Blocked by #110490 |
+ refs | ||
) | ||
); | ||
throw new IllegalStateException("Reference [" + ua.qualifiedName() + "] is ambiguous; " + "matches any of " + refs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ambiguities were possible with row
earlier; the message suggesting disambiguation does not make sense - that's not possible. This was carried over from ql
and made sense for SQL. If we have ambiguities in ESQL, that's a bug IMO.
@@ -69,13 +69,6 @@ a:integer | b:integer | c:null | z:integer | |||
1 | 2 | null | null | |||
; | |||
|
|||
evalRowWithNull2 | |||
row a = 1, null, b = 2, c = null, null | eval z = a+b; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one had ambiguous attribute names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider this a bug fix or a breaking change? (IMHO it's a bug, but I see it's questionable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, this is a bug, since there's no way to refer to the attributes named null
in following commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding row
yes it seems there is an inconsistency between it and the rest of the commands. But first, let's make sure we cover all the commands from the point of view of attribute names uniqueness and if all stands then we can change row
:
rename a as foo, b as foo
keeps the lastfoo
- look into
enrich
,grok
,dissect
,mv_expand
regarding names uniqueness - see if any wildcarded names (where they are allowed -
keep
,drop
maybe some other places) don't trip the unique names verification - combine this change with whatever needs to change in union types so that the whole story makes sense project-wide
- look into fields and sub-fields where the name have dots -
rootfield.subfield.subsubfield
- we usually don't test these and that's a pity
Hi @alex-spies, I've updated the changelog YAML for you. |
…n-attribute-names
Set<String> outputAttributeNames = new HashSet<>(); | ||
Set<NameId> outputAttributeIds = new HashSet<>(); | ||
for (Attribute outputAttr : p.output()) { | ||
if (outputAttributeNames.add(outputAttr.name()) == false || outputAttributeIds.add(outputAttr.id()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.name()
-> .qualifiedName()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I changed my mind on this:
It is the actual name that needs to be unique; otherwise, bugs could slip in because we somehow end up using qualifiers on accident; qualifiers are not respected by our optimization rules, e.g. mergeOutputAttributes
; this PR demonstrates that qualifiers are entirely unused, and the validation for the current state should reflect current assumptions.
If we end up using qualifiers after all (I think that's really for the future and we should really remove them until then), we can easily update the validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try also to push mv_expand
to its edges.
} | ||
}); | ||
|
||
return plan.transformUp(LogicalPlan.class, p -> p.resolved() || p.childrenResolved() == false ? p : doRule(p)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why p should be resolved, while p.children should be unresolved. Should it not be p -> p.resolved() == false || p.childrenResolved() == false ? p : doRule(p)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was carried over from AnalyzerRule
and BaseAnalyzerRule
while collapsing their execution logic into this simple Rule
.
Currently it says "I'll just return the current plan without changes (i.e. skip) if it's either already resolved or has yet-to-be resolved children". I think this is correct.
FROM employees | ||
| SORT emp_no ASC | ||
| KEEP emp_no, first_name, last_name | ||
| RENAME emp_no AS last_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting edge case.
Maybe I am reading the docs update wrong (thank you for updating the docs as well):
If it conflicts with an existing column name,
the existing column is replaced by the renamed column. If multiple columns
are renamed to the same name, all but the rightmost column with the same new
name are dropped.
but there is a contradiction between what we say in docs and what actually happens. According to docs, in the results below, last_name
should contain info from emp_no
but it should be the second column from left to right, not the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch! The column is not replaced, it's dropped. I'll update the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Thanks for your reviews, @leemthompo , @astefan , @craigtaverner and @luigidellaquila ! I updated |
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
* Enforce an invariant in our dependency checker so that logical plans never have duplicate output attribute names or ids. * Fix ROW to not produce columns with duplicate names. * Fix ResolveUnionTypes to not create multiple synthetic field attributes for the same union type. * Add tests for commands using the same column name more than once. * Update docs w.r.t. how commands behave if they are used with duplicate column names. (cherry picked from commit da53921) # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Rename.java
* Enforce an invariant in our dependency checker so that logical plans never have duplicate output attribute names or ids. * Fix ROW to not produce columns with duplicate names. * Fix ResolveUnionTypes to not create multiple synthetic field attributes for the same union type. * Add tests for commands using the same column name more than once. * Update docs w.r.t. how commands behave if they are used with duplicate column names.
* ESQL: Validate unique plan attribute names (#110488) * Enforce an invariant in our dependency checker so that logical plans never have duplicate output attribute names or ids. * Fix ROW to not produce columns with duplicate names. * Fix ResolveUnionTypes to not create multiple synthetic field attributes for the same union type. * Add tests for commands using the same column name more than once. * Update docs w.r.t. how commands behave if they are used with duplicate column names. (cherry picked from commit da53921) # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Rename.java * Remove unrelated csv tests These slipped in via merge conflicts.
Calling Rename.output() previously returned wrong results. Since #110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename.
Calling Rename.output() previously returned wrong results. Since elastic#110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename. (cherry picked from commit 7df1b06) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Rename.java
Calling Rename.output() previously returned wrong results. Since #110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename. (cherry picked from commit 7df1b06) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Rename.java
Calling Rename.output() previously returned wrong results. Since elastic#110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename.
Calling Rename.output() previously returned wrong results. Since elastic#110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename.
Fix #110541
There's an unwritten invariant in ES|QL that's worth spelling out: a command's/logical plan's output attributes need to have unique names. In contrast to SQL, there's no way to disambiguate in cases like
because we do not have qualifiers, like
some_table.a
. Instead, ES|QL performs variable shadowing, like inEVAL
, where the rightmost assignment wins:EVAL
's output will have the attributesy
andx
- the rightmost one.Let's enforce this invariant in our dependency checker; there's only one deviation from it, currently, which is
ROW
- I think this is a bug which this PR also fixes.This is only enforced for logical plans; the dependency checker for physical plans is yet to be enabled as part of #105436.
Depends on #110793