-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update arrow to 51, datafusion to 37 #2240
feat: update arrow to 51, datafusion to 37 #2240
Conversation
schema: schema.clone(), | ||
properties: PlanProperties::new( | ||
EquivalenceProperties::new(schema), | ||
Partitioning::RoundRobinBatch(1), | ||
datafusion::physical_plan::ExecutionMode::Bounded, | ||
), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ExecPlan
interface changed. output_partitioning
and output_ordering
have gone away and properties()
now replaces both of those (as well as a new field, "execution mode".
pub trait ExprExt { | ||
// Helper function to replace Expr::field in DF 37 since DF | ||
// confuses itself with the GetStructField returned by Expr::field | ||
fn field_newstyle(&self, name: &str) -> Expr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the most complicated change. The comment explains it ok. I expect this may eventually go away (e.g. when apache/datafusion#10181 is resolved)
// TODO: consider making this dispatch more generic, i.e. fun.output_type -> coerce | ||
// instead of hardcoding coerce method for each function | ||
Expr::ScalarFunction(ScalarFunction { | ||
func_def: ScalarFunctionDefinition::BuiltIn(BuiltinScalarFunction::RegexpMatch), | ||
func_def: ScalarFunctionDefinition::UDF(udf), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no more builtin functions. So we need to look for regexp_match
as a UDF instead.
@@ -25,16 +25,18 @@ impl PhysicalOptimizerRule for CoalesceTake { | |||
plan: Arc<dyn ExecutionPlan>, | |||
_config: &ConfigOptions, | |||
) -> DFResult<Arc<dyn ExecutionPlan>> { | |||
plan.transform_down(&|plan| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DF made minor changes to the transform_down
function but no real change here (e.g. Yes
-> yes
and we return transform_down(...).data()
@@ -183,25 +203,23 @@ impl ContextProvider for LanceContextProvider { | |||
))) | |||
} | |||
|
|||
fn get_aggregate_meta(&self, name: &str) -> Option<Arc<AggregateUDF>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SqlToRel
now relies on the ContextProvider
for functions (e.g. regexp_match). These come from a SessionState
instance and so we needed to add one of those to the LanceContextProvider
(the default is fine)
@@ -69,6 +75,10 @@ impl ExecutionPlan for TestingExec { | |||
} | |||
|
|||
fn statistics(&self) -> datafusion::error::Result<datafusion::physical_plan::Statistics> { | |||
todo!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: DF now actually calls statistics
and so we can't get away with todo!()
here anymore.
Cargo.toml
Outdated
"array_expressions", | ||
"regex_expressions", | ||
] } | ||
datafusion-common = "37.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datafusion is on 37.1 https://docs.rs/datafusion/latest/datafusion/
Should we bump to datafusion 37.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea, maybe just 37
for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, probably best to do 37.1 just to avoid issues. It doesn't change that often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have a reason to require >=37.1
, I feel like it would be nice to keep at 37.0
. Then users can use any 37.X
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean 37
? Or does 37.0
allow 37.1
?
Scanning through apache/datafusion#9904 I don't see any defects that would impact us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending CI
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2240 +/- ##
==========================================
+ Coverage 81.01% 81.07% +0.06%
==========================================
Files 186 186
Lines 53583 53722 +139
Branches 53583 53722 +139
==========================================
+ Hits 43408 43554 +146
+ Misses 7707 7690 -17
- Partials 2468 2478 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -34,6 +38,45 @@ fn resolve_value(expr: &Expr, data_type: &DataType) -> Result<Expr> { | |||
} | |||
} | |||
|
|||
/// A simple helper function that interprets an Expr as a string scalar | |||
/// or returns None if it is not. | |||
pub fn get_as_string_scalar_opt(expr: &Expr) -> Option<&String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: &String
in a signature is pretty much never desirable.
pub fn get_as_string_scalar_opt(expr: &Expr) -> Option<&String> { | |
pub fn get_as_string_scalar_opt(expr: &Expr) -> Option<&str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to &str
.
Ok(expr.clone()) | ||
} | ||
})) | ||
} else if let Some(left_type) = dbg!(resolve_column_type(left.as_ref(), schema)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left over dbg!
?
} else if let Some(left_type) = dbg!(resolve_column_type(left.as_ref(), schema)) { | |
} else if let Some(left_type) = resolve_column_type(left.as_ref(), schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Removed.
No description provided.