ESQL: Validate unique plan attribute names #110488

alex-spies · 2024-07-04T16:38:17Z

There's an unwritten invariant in ES|QL that's worth spelling out: a command's/logical plan's output attributes need to have unique names. In contrast to SQL, there's no way to disambiguate in cases like

ROW a = 1, a = 2 | eval b = 2*a

because we do not have qualifiers, like some_table.a. Instead, ES|QL performs variable shadowing, like in EVAL, where the rightmost assignment wins:

| EVAL x = 2*some_field, y = -some_other_field, x = x + y

EVAL's output will have the attributes y and x - the rightmost one.

Let's enforce this invariant in our dependency checker; there's only one deviation from it, currently, which is ROW - I think this is a bug which this PR also fixes.

This is only enforced for logical plans; the dependency checker for physical plans is yet to be enabled as part of #105436.

Depends on #110793

elasticsearchmachine · 2024-07-04T16:38:41Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-07-04T16:38:41Z

Hi @alex-spies, I've created a changelog YAML for you.

alex-spies · 2024-07-04T17:23:46Z

Blocked by #110490

alex-spies · 2024-07-05T07:39:57Z

...plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/analyzer/AnalyzerRules.java

-                    + refs
-            )
-        );
+        throw new IllegalStateException("Reference [" + ua.qualifiedName() + "] is ambiguous; " + "matches any of " + refs);


Ambiguities were possible with row earlier; the message suggesting disambiguation does not make sense - that's not possible. This was carried over from ql and made sense for SQL. If we have ambiguities in ESQL, that's a bug IMO.

alex-spies · 2024-07-05T07:40:34Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/row.csv-spec

@@ -69,13 +69,6 @@ a:integer | b:integer | c:null | z:integer
 1 | 2 | null | null
 ;

-evalRowWithNull2
-row a = 1, null, b = 2, c = null, null | eval z = a+b;


This one had ambiguous attribute names.

Should we consider this a bug fix or a breaking change? (IMHO it's a bug, but I see it's questionable)

I think yes, this is a bug, since there's no way to refer to the attributes named null in following commands.

astefan

Regarding row yes it seems there is an inconsistency between it and the rest of the commands. But first, let's make sure we cover all the commands from the point of view of attribute names uniqueness and if all stands then we can change row:

rename a as foo, b as foo keeps the last foo
look into enrich, grok, dissect, mv_expand regarding names uniqueness
see if any wildcarded names (where they are allowed - keep, drop maybe some other places) don't trip the unique names verification
combine this change with whatever needs to change in union types so that the whole story makes sense project-wide
look into fields and sub-fields where the name have dots - rootfield.subfield.subsubfield - we usually don't test these and that's a pity

elasticsearchmachine · 2024-07-05T16:19:05Z

Hi @alex-spies, I've updated the changelog YAML for you.

…n-attribute-names

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

astefan · 2024-07-08T15:31:37Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java

+            Set<String> outputAttributeNames = new HashSet<>();
+            Set<NameId> outputAttributeIds = new HashSet<>();
+            for (Attribute outputAttr : p.output()) {
+                if (outputAttributeNames.add(outputAttr.name()) == false || outputAttributeIds.add(outputAttr.id()) == false) {


.name() -> .qualifiedName()?

Actually, I changed my mind on this:

It is the actual name that needs to be unique; otherwise, bugs could slip in because we somehow end up using qualifiers on accident; qualifiers are not respected by our optimization rules, e.g. mergeOutputAttributes; this PR demonstrates that qualifiers are entirely unused, and the validation for the current state should reflect current assumptions.

If we end up using qualifiers after all (I think that's really for the future and we should really remove them until then), we can easily update the validation.

astefan

Try also to push mv_expand to its edges.

craigtaverner · 2024-07-16T13:47:26Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

+                }
+            });
+
+            return plan.transformUp(LogicalPlan.class, p -> p.resolved() || p.childrenResolved() == false ? p : doRule(p));


Not sure I understand why p should be resolved, while p.children should be unresolved. Should it not be p -> p.resolved() == false || p.childrenResolved() == false ? p : doRule(p)

This was carried over from AnalyzerRule and BaseAnalyzerRule while collapsing their execution logic into this simple Rule.

Currently it says "I'll just return the current plan without changes (i.e. skip) if it's either already resolved or has yet-to-be resolved children". I think this is correct.

astefan · 2024-07-16T14:04:03Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/rename.csv-spec

+FROM employees
+| SORT emp_no ASC
+| KEEP emp_no, first_name, last_name
+| RENAME emp_no AS last_name


That's an interesting edge case.
Maybe I am reading the docs update wrong (thank you for updating the docs as well):

If it conflicts with an existing column name,
the existing column is replaced by the renamed column. If multiple columns
are renamed to the same name, all but the rightmost column with the same new
name are dropped.

but there is a contradiction between what we say in docs and what actually happens. According to docs, in the results below, last_name should contain info from emp_no but it should be the second column from left to right, not the first one.

Ah, good catch! The column is not replaced, it's dropped. I'll update the doc.

astefan

LGTM

luigidellaquila

LGTM, thanks!

…n-attribute-names

alex-spies · 2024-07-17T08:33:22Z

Thanks for your reviews, @leemthompo , @astefan , @craigtaverner and @luigidellaquila !

I updated keep.asciidoc one more time (I noticed the info I added would be duplicated), then it's time for a merge!

elasticsearchmachine · 2024-07-17T09:40:21Z

💔 Backport failed

Status	Branch	Result
❌	8.15	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 110488

alex-spies · 2024-07-17T10:28:05Z

💚 All backports created successfully

Status	Branch	Result
✅	8.15

Questions ?

Please refer to the Backport tool documentation

* Enforce an invariant in our dependency checker so that logical plans never have duplicate output attribute names or ids. * Fix ROW to not produce columns with duplicate names. * Fix ResolveUnionTypes to not create multiple synthetic field attributes for the same union type. * Add tests for commands using the same column name more than once. * Update docs w.r.t. how commands behave if they are used with duplicate column names. (cherry picked from commit da53921) # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Rename.java

* Enforce an invariant in our dependency checker so that logical plans never have duplicate output attribute names or ids. * Fix ROW to not produce columns with duplicate names. * Fix ResolveUnionTypes to not create multiple synthetic field attributes for the same union type. * Add tests for commands using the same column name more than once. * Update docs w.r.t. how commands behave if they are used with duplicate column names.

* ESQL: Validate unique plan attribute names (#110488) * Enforce an invariant in our dependency checker so that logical plans never have duplicate output attribute names or ids. * Fix ROW to not produce columns with duplicate names. * Fix ResolveUnionTypes to not create multiple synthetic field attributes for the same union type. * Add tests for commands using the same column name more than once. * Update docs w.r.t. how commands behave if they are used with duplicate column names. (cherry picked from commit da53921) # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Rename.java * Remove unrelated csv tests These slipped in via merge conflicts.

Calling Rename.output() previously returned wrong results. Since #110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename.

Calling Rename.output() previously returned wrong results. Since elastic#110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename. (cherry picked from commit 7df1b06) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Rename.java

Calling Rename.output() previously returned wrong results. Since #110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename. (cherry picked from commit 7df1b06) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Rename.java

Calling Rename.output() previously returned wrong results. Since elastic#110488, instead it throws an IllegalStateException. That leads to test failures in the EsqlNodeSubclassTests because e.g. MvExpandExec and FieldExtractExec eagerly calls .output() on its child when it's being constructed, and the child can be a fragment containing a Rename.

alex-spies added 2 commits July 4, 2024 18:31

Unique output attribute names after optimization

134a3e8

Enforce unique row attribute names in verifier

9d9c70f

alex-spies added >bug :Analytics/ES|QL AKA ESQL v8.15.0 labels Jul 4, 2024

alex-spies requested review from craigtaverner, luigidellaquila and astefan July 4, 2024 16:38

elasticsearchmachine added v8.16.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jul 4, 2024

Update docs/changelog/110488.yaml

0d4e1df

alex-spies mentioned this pull request Jul 4, 2024

ESQL: union types use duplicate attribute names #110490

Closed

alex-spies commented Jul 5, 2024

View reviewed changes

astefan reviewed Jul 5, 2024

View reviewed changes

alex-spies added 4 commits July 5, 2024 17:18

Add tests for grok, dissect, enrich

6f36c29

Add tests for keep

cd48514

Make row consistent with other plans

3a5dab7

Update docs/changelog/110488.yaml

f71ef42

Add test for drop, rename and stats

42be4eb

alex-spies marked this pull request as draft July 5, 2024 16:47

Merge remote-tracking branch 'upstream/main' into validate-unique-pla…

2a0d630

…n-attribute-names

astefan reviewed Jul 8, 2024

View reviewed changes

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec Show resolved Hide resolved

astefan reviewed Jul 8, 2024

View reviewed changes

Add test dataset with deeper field hierarchy

11da00c

alex-spies added 2 commits July 16, 2024 13:42

Update RENAME docs and tests

94737ff

Avoid duplicate field attribs from union type res

ea9b9a9

craigtaverner reviewed Jul 16, 2024

View reviewed changes

astefan reviewed Jul 16, 2024

View reviewed changes

alex-spies added 4 commits July 16, 2024 16:54

Fix leftovers

f5d9568

Make tests deterministic

a387165

Fix rename shadowing docs

233d68d

Apply Liam's doc remarks

d0723b0

astefan approved these changes Jul 17, 2024

View reviewed changes

luigidellaquila approved these changes Jul 17, 2024

View reviewed changes

alex-spies added 2 commits July 17, 2024 10:31

Don't describe KEEP precedence twice

fb17126

Merge remote-tracking branch 'upstream/main' into validate-unique-pla…

bc354f4

…n-attribute-names

alex-spies merged commit da53921 into elastic:main Jul 17, 2024
16 checks passed

elasticsearchmachine added the backport pending label Jul 17, 2024

alex-spies mentioned this pull request Jul 17, 2024

[8.15] ESQL: Validate unique plan attribute names (#110488) #110966

Merged

This was referenced Jul 17, 2024

[CI] EsqlNodeSubclassTests testTransform {class org.elasticsearch.xpack.esql.plan.physical.HashJoinExec} failing #110967

Closed

ESQL: Correctly compute Rename's output #110968

Merged

alex-spies deleted the validate-unique-plan-attribute-names branch July 17, 2024 10:51

lkts mentioned this pull request Aug 13, 2024

Fix references to logsdb index mode in release highlights lkts/elasticsearch#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Validate unique plan attribute names #110488

ESQL: Validate unique plan attribute names #110488

alex-spies commented Jul 4, 2024 •

edited

Loading

elasticsearchmachine commented Jul 4, 2024

elasticsearchmachine commented Jul 4, 2024

alex-spies commented Jul 4, 2024

alex-spies Jul 5, 2024

alex-spies Jul 5, 2024

luigidellaquila Jul 5, 2024

alex-spies Jul 5, 2024

astefan left a comment

elasticsearchmachine commented Jul 5, 2024

astefan Jul 8, 2024

alex-spies Jul 9, 2024

astefan left a comment

craigtaverner Jul 16, 2024 •

edited

Loading

alex-spies Jul 16, 2024 •

edited

Loading

astefan Jul 16, 2024

alex-spies Jul 16, 2024

astefan left a comment

luigidellaquila left a comment

alex-spies commented Jul 17, 2024

elasticsearchmachine commented Jul 17, 2024

alex-spies commented Jul 17, 2024

ESQL: Validate unique plan attribute names #110488

ESQL: Validate unique plan attribute names #110488

Conversation

alex-spies commented Jul 4, 2024 • edited Loading

elasticsearchmachine commented Jul 4, 2024

elasticsearchmachine commented Jul 4, 2024

alex-spies commented Jul 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Jul 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

craigtaverner Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

alex-spies Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

luigidellaquila left a comment

Choose a reason for hiding this comment

alex-spies commented Jul 17, 2024

elasticsearchmachine commented Jul 17, 2024

💔 Backport failed

alex-spies commented Jul 17, 2024

💚 All backports created successfully

Questions ?

alex-spies commented Jul 4, 2024 •

edited

Loading

craigtaverner Jul 16, 2024 •

edited

Loading

alex-spies Jul 16, 2024 •

edited

Loading