-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Remove dangling table references in unparser
#13405
Conversation
@@ -585,7 +665,7 @@ fn test_aggregation_without_projection() -> Result<()> { | |||
|
|||
assert_eq!( | |||
actual, | |||
r#"SELECT sum(users.age), users."name" FROM (SELECT users."name", users.age FROM users) GROUP BY users."name""# | |||
r#"SELECT sum(age), "name" FROM (SELECT users."name", users.age FROM users) GROUP BY "name""# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
unparser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ => None, | ||
} | ||
} | ||
pub fn get_alias(&self) -> Option<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably avoid a bunch of copies if you made this return a reference to a &str
rather than a String
-- if the caller needed the string they can always copy it.
pub fn get_alias(&self) -> Option<String> { | |
pub fn get_alias(&self) -> Option<&str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Ident
doesn't implement anything that would return a &str
, so it needs a String
intermediary. I'm also not sure what copies you're referring too, I don't make any copies of the values from collect_valid_idents
? The return from get_alias
also isn't cloned, and is taken ownership of by collect_valid_idents
.
@@ -158,10 +158,12 @@ impl Unparser<'_> { | |||
} | |||
|
|||
let mut twj = select_builder.pop_from().unwrap(); | |||
twj.relation(relation_builder); | |||
twj.relation(relation_builder.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this needs to have a clone now 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because twj.relation()
takes ownership of the relation_builder
causing it to move, so we can't borrow it again later.
The only thing I use the relation builder for is to retrieve the list of all the identifiers, so I could probably do that before the twj.relation()
then just pass those like:
let valid_idents = select_builder.collect_valid_idents();
twj.relation();
which shouldn't require a clone.
|
||
/// Takes an input list of identifiers and a list of identifiers that are available from relations or joins. | ||
/// Removes any table identifiers that are not present in the list of available identifiers, retains original column names. | ||
pub fn remove_dangling_identifiers(idents: &mut Vec<Ident>, available_idents: &[String]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this code super deeply, but this seems to me like it is treating the symptom (incorrect qualifiers) rather than the root cause.
Specifically, did you look into fixing the code so that it didn't create incorrect indentifiers in the first place, rather than trying to modify the created AST after the fact to remove incorrect indentifers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had taken a look into doing this at the unparser LogicalPlan
level but I wasn't making very good progress. It could be my lack of understanding with LogicalPlan
, but I think the symptom originates from the parser rather than the unparser.
If you'd be open to merging this still as an AST modifier, perhaps we could gate it behind a dialect option or feature flag as a non-default?
Maybe @sgrebnov or @phillipleblanc can offer some advice about how to proceed with this PR -- I feel like it adds significantly complexity and I am not sure it doesn't also introduce some hard to understand subtle bugs with indentifiers |
I plan to take a shot at preventing the identifiers from getting introduced in the first place and/or adding subquery alias nodes as appropriate. I spent a little time on it this weekend and was able to get the bare aggregation + table scan working. @peasee let's close this PR for now. |
Which issue does this PR close?
Closes #13027
Rationale for this change
This change ensures that table references which don't exist at their current level are removed. For example,
SELECT ta.j1_id FROM (SELECT j1_id FROM j1 ta)
is invalid because the subquery is un-aliased sota
is not a valid reference at the top-level projection.This is usually caused by derived subqueries, both un-aliased and aliased.
What changes are included in this PR?
select_to_sql_expr
, collects the available table identifiers at that level from the projection and table joins.ta.j1_id
->j1_id
).Are these changes tested?
Yes. A collection of new plan-to-SQL roundtrip tests to validate the changes.
Are there any user-facing changes?
No