-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance the formatting for Column #11724
Changes from 5 commits
7bbd770
b191f3c
9fbc372
f96fcc5
94d5527
f6d6a6f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2752,11 +2752,10 @@ fn calc_func_dependencies_for_project( | |
.iter() | ||
.filter_map(|expr| { | ||
let expr_name = match expr { | ||
Expr::Alias(alias) => { | ||
format!("{}", alias.expr) | ||
} | ||
_ => format!("{}", expr), | ||
}; | ||
Expr::Alias(alias) => alias.expr.display_name(), | ||
_ => expr.display_name(), | ||
} | ||
.ok()?; | ||
Comment on lines
+2755
to
+2758
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know whether this makes sense, but we need a name for the function dependencies, so I chose a name that wouldn't be affected by Display. By the way, I'm confused about why we have so many different names or similar display methods for Expr. Maybe we should organize them or name them more clearly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #11782 filed |
||
input_fields.iter().position(|item| *item == expr_name) | ||
}) | ||
.collect::<Vec<_>>(); | ||
|
@@ -2906,7 +2905,7 @@ mod tests { | |
use super::*; | ||
use crate::builder::LogicalTableSource; | ||
use crate::logical_plan::table_scan; | ||
use crate::{col, exists, in_subquery, lit, placeholder, GroupingSet}; | ||
use crate::{col, exists, ident, in_subquery, lit, placeholder, GroupingSet}; | ||
|
||
use datafusion_common::tree_node::{TransformedResult, TreeNodeVisitor}; | ||
use datafusion_common::{not_impl_err, Constraint, ScalarValue}; | ||
|
@@ -3512,4 +3511,44 @@ digraph { | |
let actual = format!("{}", plan.display_indent()); | ||
assert_eq!(expected.to_string(), actual) | ||
} | ||
|
||
#[test] | ||
fn test_display_unqualifed_ident() { | ||
let schema = Schema::new(vec![ | ||
Field::new("max(id)", DataType::Int32, false), | ||
Field::new("state", DataType::Utf8, false), | ||
]); | ||
|
||
let plan = table_scan(Some("t"), &schema, None) | ||
.unwrap() | ||
.filter(col("state").eq(lit("CO"))) | ||
.unwrap() | ||
.project(vec![col("max(id)")]) | ||
.unwrap() | ||
.build() | ||
.unwrap(); | ||
|
||
let expected = | ||
"Projection: t.max(id)\n Filter: t.state = Utf8(\"CO\")\n TableScan: t"; | ||
let actual = format!("{}", plan.display_indent()); | ||
assert_eq!(expected.to_string(), actual); | ||
|
||
let schema = Schema::new(vec![ | ||
Field::new("id", DataType::Int32, false), | ||
Field::new("t.id", DataType::Int32, false), | ||
]); | ||
|
||
let plan = table_scan(Some("t"), &schema, None) | ||
.unwrap() | ||
.build() | ||
.unwrap(); | ||
let projection = LogicalPlan::Projection( | ||
Projection::try_new(vec![col("t.id"), ident("t.id")], Arc::new(plan)) | ||
.unwrap(), | ||
); | ||
|
||
let expected = "Projection: t.id, \"t.id\"\n TableScan: t"; | ||
let actual = format!("{}", projection.display_indent()); | ||
assert_eq!(expected.to_string(), actual); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayzhan211 Before I fix other tests, I want to check if this behavior makes sense. (It involves too many tests 😢 ).
Now, we only quote an identifier if it contains the dot. However, some cases like
sum(t1.c1)
will also be quoted, even if it's a function call. I think it's not worth doing more checking to exclude this kind of case. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not ideal if
sum(t1.c1)
is quoted 🤔 . I hope the change is as small as possible, so I would prefer to keep function or others Expr remain the same, only identifier with dot is quoted.We could also hold on and wait for more input from other's about the change of this, given the change of this is not trivial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of modifying
Column
, we should modify thedisplay_name
forExpr
, so if we found column inside ScalarFunction, we could skip the double quote anyway. (by something like boolean flag?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's that simple 🤔. In my experience, the column might look like this:
I think it's hard to find a consistent pattern for it because we use many
Column::from_name
calls to create projections. For example, indatafusion/datafusion/sql/src/relation/join.rs
Line 118 in f4e519f
the column name could be complex and unruly.