-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ambiguous reference when aliasing in combination with ORDER BY
#8425
Conversation
# Conflicts: # datafusion/physical-plan/src/joins/hash_join_utils.rs
# Conflicts: # datafusion/physical-plan/src/joins/hash_join_utils.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Asura7969 . This is a really nice fix 🕵️ 💯
The code in this PR looks good to me, though I do wonder abut the relation
field (see inline comment)
Also, I tested the original query locally and this PR does indeed fix the problem 🎉
DataFusion CLI v33.0.0
❯ CREATE TABLE t (a INTEGER);
0 rows in set. Query took 0.018 seconds.
❯ SELECT t.a AS a FROM t ORDER BY t.a;
0 rows in set. Query took 0.011 seconds.
❯
Can you please add some tests to ensure we don't break it accidentally in the future (the only tests that were changed are related to subqueries and I worry someone will accidentally break remove the fix code and not realize we introduced a regression)?
These should probably be end to end sqllogic tests:
https://github.com/apache/arrow-datafusion/tree/main/datafusion/sqllogictest
Perhaps added to https://github.com/apache/arrow-datafusion/blob/main/datafusion/sqllogictest/test_files/select.slt
@@ -384,7 +384,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
&[&[plan.schema()]], | |||
&plan.using_columns()?, | |||
)?; | |||
let expr = col.alias(self.normalizer.normalize(alias)); | |||
let name = self.normalizer.normalize(alias); | |||
let expr = match &col { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment here about what this is doing? I think it is avoiding adding an alias if the column name is the same.
Do we have to worry about relation names here too (like what if column.relation
is non null?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but I didn't expect the scenario where column.relation
is inconsistent🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a query like
EXPLAIN select a as a FROM table1 CROSS JOIN table2 order by a
Where both table1 and table2 have a column named a
. I think we should add a test for this case too -- I'll try and do so later today
I would expect the test to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you expected
DataFusion CLI v33.0.0
❯ CREATE TABLE table1 (a int) as values(1);
0 rows in set. Query took 0.037 seconds.
❯ CREATE TABLE table2 (a int) as values(1);
0 rows in set. Query took 0.012 seconds.
❯ EXPLAIN select a as a FROM table1 CROSS JOIN table2 order by a;
Schema error: Ambiguous reference to unqualified field a
let expected = "Sort: CAST(first_name AS first_name AS Int32) ASC NULLS LAST\ | ||
\n Projection: first_name AS first_name\ | ||
\n Projection: person.first_name AS first_name\ | ||
let expected = "Sort: CAST(person.first_name AS Int32) ASC NULLS LAST\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this certainly looks better
fyi @Jesse-Bakker |
\n TableScan: person"; | ||
quick_test(sql, expected); | ||
} | ||
|
||
#[test] | ||
fn test_avoid_add_alias() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure logical consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Asura7969 -- there is just one last thing I worry about with multiple relation names that I think we can resolve with some more tests.
@@ -384,7 +384,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
&[&[plan.schema()]], | |||
&plan.using_columns()?, | |||
)?; | |||
let expr = col.alias(self.normalizer.normalize(alias)); | |||
let name = self.normalizer.normalize(alias); | |||
let expr = match &col { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a query like
EXPLAIN select a as a FROM table1 CROSS JOIN table2 order by a
Where both table1 and table2 have a column named a
. I think we should add a test for this case too -- I'll try and do so later today
I would expect the test to fail
|
||
# ambiguous column references in on join | ||
query error DataFusion error: Schema error: Ambiguous reference to unqualified field a | ||
EXPLAIN select a as a FROM table1 t1 CROSS JOIN table1 t2 order by a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sticking with this @Asura7969 -- looks good to me
…pache#8425) * Minor: Improve the document format of JoinHashMap * ambiguous reference * ignore aliases * fix * fix * add test * add test * add test * add test
Which issue does this PR close?
Closes #8391.
Rationale for this change
What changes are included in this PR?
When the field alias is consistent with the original name, cancel the alias
Are these changes tested?
Overwrite existing tests
Are there any user-facing changes?