Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Remove unnecessary clone in datafusion_proto #7921

Merged
merged 1 commit into from
Oct 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions datafusion/proto/src/logical_plan/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ impl AsLogicalPlan for LogicalPlanNode {
.collect::<Result<Vec<_>, _>>()?;

let options = ListingOptions::new(file_format)
.with_file_extension(scan.file_extension.clone())
.with_file_extension(&scan.file_extension)
.with_table_partition_cols(
scan.table_partition_cols
.iter()
Expand Down Expand Up @@ -458,7 +458,7 @@ impl AsLogicalPlan for LogicalPlanNode {
let input: LogicalPlan =
into_logical_plan!(repartition.input, ctx, extension_codec)?;
use protobuf::repartition_node::PartitionMethod;
let pb_partition_method = repartition.partition_method.clone().ok_or_else(|| {
let pb_partition_method = repartition.partition_method.as_ref().ok_or_else(|| {
DataFusionError::Internal(String::from(
"Protobuf deserialization error, RepartitionNode was missing required field 'partition_method'",
))
Comment on lines -461 to 464
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be returned early and the clone could be avoided.

Expand All @@ -473,10 +473,10 @@ impl AsLogicalPlan for LogicalPlanNode {
.iter()
.map(|expr| from_proto::parse_expr(expr, ctx))
.collect::<Result<Vec<_>, _>>()?,
partition_count as usize,
*partition_count as usize,
),
PartitionMethod::RoundRobin(partition_count) => {
Partitioning::RoundRobinBatch(partition_count as usize)
Partitioning::RoundRobinBatch(*partition_count as usize)
}
};

Expand Down
19 changes: 7 additions & 12 deletions datafusion/proto/src/physical_plan/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -394,17 +394,12 @@ impl AsExecutionPlan for PhysicalPlanNode {
vec![]
};

let input_schema = hash_agg
.input_schema
.as_ref()
.ok_or_else(|| {
DataFusionError::Internal(
"input_schema in AggregateNode is missing.".to_owned(),
)
})?
.clone();
let physical_schema: SchemaRef =
SchemaRef::new((&input_schema).try_into()?);
let input_schema = hash_agg.input_schema.as_ref().ok_or_else(|| {
DataFusionError::Internal(
"input_schema in AggregateNode is missing.".to_owned(),
)
})?;
let physical_schema: SchemaRef = SchemaRef::new(input_schema.try_into()?);
Comment on lines -397 to +402
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SchemaRef seems able create from ref of input_schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I is confusing, but SchemaRef is an alias for Arc<Schema> which is fast and quick to copy

Copy link
Contributor Author

@ongchi ongchi Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input_schema here is a &protobuf::Schema, and the SchemaRef is an alias of Arc<arrow_schema::Schema>.

https://github.com/apache/arrow-datafusion/blob/0911f1523ec7088bae88684ecb9bca94aa553693/datafusion/proto/src/logical_plan/from_proto.rs#L372-L379

https://github.com/apache/arrow-datafusion/blob/0911f1523ec7088bae88684ecb9bca94aa553693/datafusion/proto/src/logical_plan/from_proto.rs#L609-L620

This really confusing me at first glance, it's consume a &protobuf::Schema when calling try_into(). Seems fields will be recreated anyway, I think the clone of input_schema could be eliminated.

I don't understand why these convert trait methods convert into U from a &T rather than a T, this makes implicit clones in convert methods.

https://github.com/apache/arrow-datafusion/blob/0911f1523ec7088bae88684ecb9bca94aa553693/datafusion/proto/src/physical_plan/mod.rs#L490

And I just notice this line could be replaced by physical_schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is confusing. Any help to make it better would be most appreciated


let physical_filter_expr = hash_agg
.filter_expr
Expand Down Expand Up @@ -489,7 +484,7 @@ impl AsExecutionPlan for PhysicalPlanNode {
physical_filter_expr,
physical_order_by_expr,
input,
Arc::new((&input_schema).try_into()?),
Arc::new(input_schema.try_into()?),
)?))
}
PhysicalPlanType::HashJoin(hashjoin) => {
Expand Down