-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Remove dangling table references in unparser
#13405
Changes from all commits
a8c7be3
72317ac
596bf21
f6c3e2a
9ed4b2d
b8e2d90
5fc79a6
011742a
7cca6ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -158,10 +158,12 @@ impl Unparser<'_> { | |
} | ||
|
||
let mut twj = select_builder.pop_from().unwrap(); | ||
twj.relation(relation_builder); | ||
twj.relation(relation_builder.clone()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why this needs to have a clone now 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because The only thing I use the relation builder for is to retrieve the list of all the identifiers, so I could probably do that before the
which shouldn't require a clone. |
||
select_builder.push_from(twj); | ||
|
||
Ok(SetExpr::Select(Box::new(select_builder.build()?))) | ||
Ok(SetExpr::Select(Box::new( | ||
select_builder.build(query, &relation_builder)?, | ||
))) | ||
} | ||
|
||
/// Reconstructs a SELECT SQL statement from a logical plan by unprojecting column expressions | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ use datafusion_common::{ | |
}; | ||
use datafusion_expr::{expr::Alias, tree_node::transform_sort_vec}; | ||
use datafusion_expr::{Expr, LogicalPlan, Projection, Sort, SortExpr}; | ||
use sqlparser::ast::Ident; | ||
use sqlparser::ast::{self, display_separated, Ident}; | ||
|
||
/// Normalize the schema of a union plan to remove qualifiers from the schema fields and sort expressions. | ||
/// | ||
|
@@ -363,3 +363,138 @@ impl TreeNodeRewriter for TableAliasRewriter<'_> { | |
} | ||
} | ||
} | ||
|
||
/// Takes an input list of identifiers and a list of identifiers that are available from relations or joins. | ||
/// Removes any table identifiers that are not present in the list of available identifiers, retains original column names. | ||
pub fn remove_dangling_identifiers(idents: &mut Vec<Ident>, available_idents: &[String]) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this code super deeply, but this seems to me like it is treating the symptom (incorrect qualifiers) rather than the root cause. Specifically, did you look into fixing the code so that it didn't create incorrect indentifiers in the first place, rather than trying to modify the created AST after the fact to remove incorrect indentifers ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I had taken a look into doing this at the unparser If you'd be open to merging this still as an AST modifier, perhaps we could gate it behind a dialect option or feature flag as a non-default? |
||
if idents.len() > 1 { | ||
let ident_source = display_separated( | ||
&idents | ||
.clone() | ||
.into_iter() | ||
.take(idents.len() - 1) | ||
.collect::<Vec<Ident>>(), | ||
".", | ||
) | ||
.to_string(); | ||
// If the identifier is not present in the list of all identifiers, it refers to a table that does not exist | ||
if !available_idents.contains(&ident_source) { | ||
let Some(last) = idents.last() else { | ||
unreachable!("CompoundIdentifier must have a last element"); | ||
}; | ||
// Reset the identifiers to only the last element, which is the column name | ||
*idents = vec![last.clone()]; | ||
} | ||
} | ||
} | ||
|
||
/// Handle removing dangling identifiers from an expression | ||
/// This function can call itself recursively to handle nested expressions | ||
/// Like binary ops or functions which contain nested expressions/arguments | ||
pub fn remove_dangling_expr( | ||
expr: ast::Expr, | ||
available_idents: &Vec<String>, | ||
) -> ast::Expr { | ||
match expr { | ||
ast::Expr::BinaryOp { left, op, right } => { | ||
let left = remove_dangling_expr(*left, available_idents); | ||
let right = remove_dangling_expr(*right, available_idents); | ||
ast::Expr::BinaryOp { | ||
left: Box::new(left), | ||
op, | ||
right: Box::new(right), | ||
} | ||
} | ||
ast::Expr::Nested(expr) => { | ||
let expr = remove_dangling_expr(*expr, available_idents); | ||
ast::Expr::Nested(Box::new(expr)) | ||
} | ||
ast::Expr::CompoundIdentifier(idents) => { | ||
let mut idents = idents.clone(); | ||
remove_dangling_identifiers(&mut idents, available_idents); | ||
|
||
if idents.is_empty() { | ||
unreachable!("Identifier must have at least one element"); | ||
} else if idents.len() == 1 { | ||
ast::Expr::Identifier(idents[0].clone()) | ||
} else { | ||
ast::Expr::CompoundIdentifier(idents) | ||
} | ||
} | ||
ast::Expr::Function(ast::Function { | ||
args, | ||
name, | ||
parameters, | ||
filter, | ||
null_treatment, | ||
over, | ||
within_group, | ||
}) => { | ||
let args = if let ast::FunctionArguments::List(mut args) = args { | ||
args.args.iter_mut().for_each(|arg| match arg { | ||
ast::FunctionArg::Named { | ||
arg: ast::FunctionArgExpr::Expr(expr), | ||
.. | ||
} | ||
| ast::FunctionArg::Unnamed(ast::FunctionArgExpr::Expr(expr)) => { | ||
*expr = remove_dangling_expr(expr.clone(), available_idents); | ||
} | ||
_ => {} | ||
}); | ||
|
||
ast::FunctionArguments::List(args) | ||
} else { | ||
args | ||
}; | ||
|
||
ast::Expr::Function(ast::Function { | ||
args, | ||
name, | ||
parameters, | ||
filter, | ||
null_treatment, | ||
over, | ||
within_group, | ||
}) | ||
} | ||
_ => expr, | ||
} | ||
} | ||
|
||
#[cfg(test)] | ||
mod test { | ||
use super::*; | ||
|
||
#[test] | ||
fn test_remove_dangling_identifiers() { | ||
let tests = vec![ | ||
(vec![], vec![Ident::new("column1".to_string())]), | ||
( | ||
vec!["table1.table2".to_string()], | ||
vec![ | ||
Ident::new("table1".to_string()), | ||
Ident::new("table2".to_string()), | ||
Ident::new("column1".to_string()), | ||
], | ||
), | ||
( | ||
vec!["table1".to_string()], | ||
vec![Ident::new("column1".to_string())], | ||
), | ||
]; | ||
|
||
for test in tests { | ||
let test_in = test.0; | ||
let test_out = test.1; | ||
|
||
let mut idents = vec![ | ||
Ident::new("table1".to_string()), | ||
Ident::new("table2".to_string()), | ||
Ident::new("column1".to_string()), | ||
]; | ||
|
||
remove_dangling_identifiers(&mut idents, &test_in); | ||
assert_eq!(idents, test_out); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably avoid a bunch of copies if you made this return a reference to a
&str
rather than aString
-- if the caller needed the string they can always copy it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like
Ident
doesn't implement anything that would return a&str
, so it needs aString
intermediary. I'm also not sure what copies you're referring too, I don't make any copies of the values fromcollect_valid_idents
? The return fromget_alias
also isn't cloned, and is taken ownership of bycollect_valid_idents
.