-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: merge to use logical plans #1720
Conversation
Can be merged after #1639 |
@Blajda is this ready to be merged now or are you still working on creating some benchmarks? I see that the issue and PR you linked have been merged/resolved now |
@ion-elgreco It depends on a new version of datafusion being released and then having #1775 updated and merged |
@Blajda oh nice! Really curious to see how fast the rust merge is versus the scala merge 👀 |
9ff3057
to
8b33ecb
Compare
Below are the benchmark results. These were executed with the standard merge benchmarks compiled with release flags. The TPC-DS with a scale of 1 was used. CPU: AMD Ryzen 7 5800X 8-Core Processor Writes were done to a HDD.
|
@Blajda I assume after_duration is after logical plans. Looks quite a speed bump and also more consistent across the different queries, so something else seems to be a limiting there? Maybe your HDD? 😛 What's the scale btw, milliseconds? |
Yeah the units are milliseconds. |
Before: Merge logical to HDD Currently the size of the write is not the limiting factor
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work. I feel like I'm learning a lot about DataFusion just by reviewing your PRs :)
//TODO: Datafusion's Hashjoin has some memory issues. Running with all cores results in a OoM. Can be removed when upstream improvemetns are made. | ||
let config = SessionConfig::new().with_target_partitions(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Blajda Why don't you configure a Greedy memory pool in the RuntimeConfig? Right now the SessionContext
seems to be configured without a memory limit, so it's no surprise that it can OOM, right?
You can pass that in with SessionContext::new_with_config_rt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give this a try. Issues on the DF tracker suggest that even with a memory pool configured there are allocations that are made outside of the pool. As a start I think limiting memory to 80% of the user's system would be a good start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabled the mem pool and removed the cpu limit. Still leads to low memory system. I think this will require a follow up on how to tune this best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great and really excited to see us moving towards more proper planning!
As this is quite complex to reason about, should we maybe add some test to validate generated plans via their string representation, much like datafusion does?
That would also help me understand how the metrics observers impact pushdown. I guess since we optimize the input plans before we preserve file skipping etc, but would help to see it :).
struct MergePlanner {} | ||
|
||
#[async_trait] | ||
impl QueryPlanner for MergePlanner { | ||
async fn create_physical_plan( | ||
&self, | ||
logical_plan: &LogicalPlan, | ||
session_state: &SessionState, | ||
) -> DataFusionResult<Arc<dyn ExecutionPlan>> { | ||
let planner = Arc::new(Box::new(DefaultPhysicalPlanner::with_extension_planners( | ||
vec![Arc::new(MergeMetricExtensionPlanner {})], | ||
))); | ||
planner | ||
.create_physical_plan(logical_plan, session_state) | ||
.await | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we should consolidate this into a DeltaPlanner
, some early experiments are in the deltalake-sql crate, but that is not helpful right now 😆 - maybe add a comment?
Merge doesn't support file skipping currently. A new operator that tracks which files have modifications needs to be built and another operator that determine enables push downs depending on which operators are used. (I.e upserts can support push down but anything with a delete would require additional work)
I think this would be possible but it would require additional refactoring. Current the string repr is just spaghetti of a bunch of case statements and parquet read nodes. |
Yes, Sounds like a follow-up 😀 We could use the same heuristics as for the conflict resolution, where we assume all read files are rewritten regardless of them having matches. But again, something to dive in later. |
# Description This refactors the merge operation to use DataFusion's DataFrame and LogicalPlan APIs The NLJ is eliminated and the query planner can pick the optimal join operator. This also enables the operation to use multiple threads and should result in significant speed up. Merge is still limited to using a single thread in some area. When collecting benchmarks, I encountered multiple OoM issues with Datafusion's hash join implementation. There are multiple tickets upstream open regarding this. For now, I've limited the number of partitions to just 1 to prevent this. Predicates passed as SQL are also easier to use now. Manual casting was required to ensure data types were aligned. Now the logical plan will perform type coercion when optimizing the plan. # Related Issues - enhances delta-io#850 - closes delta-io#1790 - closes delta-io#1753
# Description This refactors the merge operation to use DataFusion's DataFrame and LogicalPlan APIs The NLJ is eliminated and the query planner can pick the optimal join operator. This also enables the operation to use multiple threads and should result in significant speed up. Merge is still limited to using a single thread in some area. When collecting benchmarks, I encountered multiple OoM issues with Datafusion's hash join implementation. There are multiple tickets upstream open regarding this. For now, I've limited the number of partitions to just 1 to prevent this. Predicates passed as SQL are also easier to use now. Manual casting was required to ensure data types were aligned. Now the logical plan will perform type coercion when optimizing the plan. # Related Issues - enhances delta-io#850 - closes delta-io#1790 - closes delta-io#1753
# Description This refactors the merge operation to use DataFusion's DataFrame and LogicalPlan APIs The NLJ is eliminated and the query planner can pick the optimal join operator. This also enables the operation to use multiple threads and should result in significant speed up. Merge is still limited to using a single thread in some area. When collecting benchmarks, I encountered multiple OoM issues with Datafusion's hash join implementation. There are multiple tickets upstream open regarding this. For now, I've limited the number of partitions to just 1 to prevent this. Predicates passed as SQL are also easier to use now. Manual casting was required to ensure data types were aligned. Now the logical plan will perform type coercion when optimizing the plan. # Related Issues - enhances delta-io#850 - closes delta-io#1790 - closes delta-io#1753
Description
This refactors the merge operation to use DataFusion's DataFrame and LogicalPlan APIs
The NLJ is eliminated and the query planner can pick the optimal join operator. This also enables the operation to use multiple threads and should result in significant speed up.
Merge is still limited to using a single thread in some area. When collecting benchmarks, I encountered multiple OoM issues with Datafusion's hash join implementation. There are multiple tickets upstream open regarding this. For now, I've limited the number of partitions to just 1 to prevent this.
Predicates passed as SQL are also easier to use now. Manual casting was required to ensure data types were aligned. Now the logical plan will perform type coercion when optimizing the plan.
Related Issues