-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve recursive unnest
options API
#12836
Conversation
unnest_with_options(input, unnestings, UnnestOptions::default()) | ||
} | ||
|
||
pub fn get_unnested_list_datatype_recursive( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this method, recursion specification can be declared using UnnestOptions
/// will only be done once (depth = 1). In case recursion is needed on a multi-dimensional | ||
/// list type, use [`ColumnUnnestList`] | ||
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd)] | ||
pub enum ColumnUnnestType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct is now a part of the builder, no need to expose to api
unnest
Merging up from main to get CI to pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @duongcongtoai -- I think this is a significant improvement in code cleaniless
I think it would be nice to add some comments, as I have described below, but I also don't think it is strictly necessary and could be done after merge
datafusion/common/src/unnest.rs
Outdated
@@ -64,13 +66,24 @@ | |||
pub struct UnnestOptions { | |||
/// Should nulls in the input be preserved? Defaults to true | |||
pub preserve_nulls: bool, | |||
/// Without explicit recursions, all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please document what the Vec
means in this context? Is the option for each of the columns in the schema? If so can you say it is expected to have the same number of elements?
If possible it would also be great to update the documentation on this struct to explain what is going on with recursive options
} | ||
|
||
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd)] | ||
pub struct RecursionUnnestOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add documentation to this structure as it is public?
/// if one column is a list type, it can be recursively and simultaneously | ||
/// unnested into the desired recursion levels | ||
/// e.g select unnest(list_col,depth=1), unnest(list_col,depth=2) | ||
pub fn unnest_columns_recursive_with_options( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
UnnestOptions::default().with_recursions(vec![ | ||
RecursionUnnestOption { | ||
input_column: "stringss".into(), | ||
output_column: "stringss_depth_1".into(), | ||
depth: 1, | ||
}, | ||
RecursionUnnestOption { | ||
input_column: "stringss".into(), | ||
output_column: "stringss_depth_2".into(), | ||
depth: 2, | ||
}, | ||
]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another potential way for this API to look would be to take individual options rather than a single Vec
I don't feel strongly about this
UnnestOptions::default().with_recursions(vec![ | |
RecursionUnnestOption { | |
input_column: "stringss".into(), | |
output_column: "stringss_depth_1".into(), | |
depth: 1, | |
}, | |
RecursionUnnestOption { | |
input_column: "stringss".into(), | |
output_column: "stringss_depth_2".into(), | |
depth: 2, | |
}, | |
]), | |
UnnestOptions::new() | |
.with_recursion( | |
RecursionUnnestOption { | |
input_column: "stringss".into(), | |
output_column: "stringss_depth_1".into(), | |
depth: 1, | |
}) | |
.with_recusion( | |
RecursionUnnestOption { | |
input_column: "stringss".into(), | |
output_column: "stringss_depth_2".into(), | |
depth: 2, | |
}, | |
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, i refactored so that the recursion is optional at column level, and since None can be replaced by an empty vector, i simplified the type of this field from Option to Vec
"| | | e |", | ||
"+---------------------------------+---------------------------------+---------------------------------+", | ||
]; | ||
assert_batches_eq!(expected, &[ret]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
datafusion/sql/src/select.rs
Outdated
@@ -301,7 +301,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
// The transformation happen bottom up, one at a time for each iteration | |||
// Only exhaust the loop if no more unnest transformation is found | |||
for i in 0.. { | |||
let mut unnest_columns = vec![]; | |||
let mut unnest_columns = HashMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to use an IndexMap here (to retain the same order as was specified in the query) -- a HashMap can mix up the order
datafusion/sql/src/select.rs
Outdated
match maybe_list_unnest { | ||
None => {} | ||
Some(list_unnest) => { | ||
list_recursions.extend(list_unnest.into_iter().map( | ||
|unnest_list| RecursionUnnestOption { | ||
input_column: col.clone(), | ||
output_column: unnest_list.output_column, | ||
depth: unnest_list.depth, | ||
}, | ||
)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could simplify this code (reduce recursion level) with if let
like
match maybe_list_unnest { | |
None => {} | |
Some(list_unnest) => { | |
list_recursions.extend(list_unnest.into_iter().map( | |
|unnest_list| RecursionUnnestOption { | |
input_column: col.clone(), | |
output_column: unnest_list.output_column, | |
depth: unnest_list.depth, | |
}, | |
)); | |
} | |
} | |
if let Some(list_unnest) = maybe_list_unnest { | |
list_recursions.extend(list_unnest.into_iter().map( | |
|unnest_list| RecursionUnnestOption { | |
input_column: col.clone(), | |
output_column: unnest_list.output_column, | |
depth: unnest_list.depth, | |
}, | |
)); | |
} | |
} |
Also I wonder if it would be possible / easier for rewrite_recursive_unnests_bottom_up
to simply return a HashSet<Column, RecursionUnnestOption>
directly rather than having to convert the results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, i can refactor this function so it returns 3 result:
- inner_projection_exprs
- unnest_placeholder_columns
- outer_projection_exprs
unnest
unnest
options API
Looks great -- thanks again @duongcongtoai |
Which issue does this PR close?
Cleanup #11577
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?