[Bug Fix]: Deem hash repartition unnecessary when input and output has 1 partition #10095

mustafasrepo · 2024-04-16T07:01:07Z

Which issue does this PR close?

Closes #9928.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

mustafasrepo · 2024-04-16T07:03:39Z

cc @echai58, this fix should solve the problem in the issue. Feel free to review, if you have time.

mustafasrepo · 2024-04-16T07:05:01Z

@korowa thanks again for the analysis which helped a lot in understanding the issue and replicating it in datafusion.

ozankabak

LGTM, thank you

echai58

Thanks for the fix!

alamb · 2024-04-16T13:33:20Z

Thanks @mustafasrepo and @ozankabak

…s 1 partition (apache#10095) * Add input partition number check * Minor changes

korowa · 2024-04-16T17:14:33Z

@mustafasrepo good catch on UNION -- I previously suggested that this kind of plan (with partitions > config.target_partitions) is "illegal" in DF.

mustafasrepo · 2024-04-17T05:55:28Z

@mustafasrepo good catch on UNION -- I previously suggested that this kind of plan (with partitions > config.target_partitions) is "illegal" in DF.

I agree, this behaviour is a bit counter intuitive. However, with current implementation of the UnionExec it is hard to make sure partitions<=config.target_partitions always true. If this behaviour prone to errors. Maybe we can insert RepartitionExec on top UnionExecs if their output partition number > config.target_partitions. By this way, we can guarantee this violation wouldn't propagate to other operators.

alamb · 2024-04-17T17:09:52Z

datafusion/sqllogictest/test_files/joins.slt

+11)------------ProjectionExec: expr=[1 as c, 3 as d]
+12)--------------PlaceholderRowExec
+
+query IIII


I think this query is non determisitic and fails sometimes on CI as it doesn't have an ORDER BY and isn't annotated with rowsort. Here is a PR to fix that: #10120

korowa · 2024-04-23T18:13:20Z

Maybe we can insert RepartitionExec on top UnionExecs if their output partition number > config.target_partitions. By this way, we can guarantee this violation wouldn't propagate to other operators.

Probably better solution would be planning union inputs execution according to total available partitions -- e.g

    select l_linenumber as f
    from lineitem
    union all
    select l_orderkey as f
    from lineitem

with target_partitions = 4, could plan 2 threads for each ParquetExec (ideally we could also use byte/row statistics and plan according to them -- not only 2-2, but probably 1-3 if there is significant data skew across inputs/files).

Currently, with target_partitions = 4, it's planned as 4 threads per ParquetExec, and 8 output partitions for UNION.

And on top of it, when target_partitions is less then number of UNION inputs (e.g. UNION has 10 inputs, target_partitions = 4, and we need at least 1 thread for each input) there could be RepartitionExec.

mustafasrepo · 2024-04-24T06:52:28Z

Maybe we can insert RepartitionExec on top UnionExecs if their output partition number > config.target_partitions. By this way, we can guarantee this violation wouldn't propagate to other operators.

Probably better solution would be planning union inputs execution according to total available partitions -- e.g
    select l_linenumber as f
    from lineitem
    union all
    select l_orderkey as f
    from lineitem
with target_partitions = 4, could plan 2 threads for each ParquetExec (ideally we could also use byte/row statistics and plan according to them -- not only 2-2, but probably 1-3 if there is significant data skew across inputs/files).

Currently, with target_partitions = 4, it's planned as 4 threads per ParquetExec, and 8 output partitions for UNION.

And on top of it, when target_partitions is less then number of UNION inputs (e.g. UNION has 10 inputs, target_partitions = 4, and we need at least 1 thread for each input) there could be RepartitionExec.

That might work. However, this approach cannot solve all cases I guess. For the following query

select * from table
union all
select * from table
union all 
select * from table

when target_partitions=2, there is no way to change input partitions to generate 2 partitions after union. We would generate at least 3 partitions(assuming each input generates single partition). Hence, this approach may not solve all use cases. If I am not mistaken.

echai58 · 2024-04-26T12:59:38Z

Hi guys, not familiar with datafusion's release process - is there a estimate of when this will be released in a new datafusion version?

alamb · 2024-04-26T17:59:01Z

Hi guys, not familiar with datafusion's release process - is there a estimate of when this will be released in a new datafusion version?

This should be included in 38.0.0 -- I just filed #10255 to track that release if you want to watch it @echai58

Add input partition number check

d145880

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Apr 16, 2024

Minor changes

e614993

ozankabak approved these changes Apr 16, 2024

View reviewed changes

echai58 approved these changes Apr 16, 2024

View reviewed changes

echai58 mentioned this pull request Apr 16, 2024

internal.DeltaError: Generic DeltaTable error: Internal error: Invalid HashJoinExec partition count mismatch 1!=2 delta-io/delta-rs#2188

Closed

alamb merged commit b914409 into apache:main Apr 16, 2024
24 checks passed

Omega359 pushed a commit to Omega359/arrow-datafusion that referenced this pull request Apr 16, 2024

[Bug Fix]: Deem hash repartition unnecessary when input and output ha…

80a9442

…s 1 partition (apache#10095) * Add input partition number check * Minor changes

alamb mentioned this pull request Apr 17, 2024

Fix intermittent CI test failure in joins.slt #10120

Merged

alamb reviewed Apr 17, 2024

View reviewed changes

alamb mentioned this pull request Apr 27, 2024

DataFusion 38.0.0 Release #10217

Closed

echai58 mentioned this pull request May 14, 2024

chore: upgrade to Datafusion 38 delta-io/delta-rs#2499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Fix]: Deem hash repartition unnecessary when input and output has 1 partition #10095

[Bug Fix]: Deem hash repartition unnecessary when input and output has 1 partition #10095

mustafasrepo commented Apr 16, 2024

mustafasrepo commented Apr 16, 2024

mustafasrepo commented Apr 16, 2024

ozankabak left a comment

echai58 left a comment

alamb commented Apr 16, 2024

korowa commented Apr 16, 2024

mustafasrepo commented Apr 17, 2024

alamb Apr 17, 2024

korowa commented Apr 23, 2024 •

edited

Loading

mustafasrepo commented Apr 24, 2024 •

edited

Loading

echai58 commented Apr 26, 2024

alamb commented Apr 26, 2024

[Bug Fix]: Deem hash repartition unnecessary when input and output has 1 partition #10095

[Bug Fix]: Deem hash repartition unnecessary when input and output has 1 partition #10095

Conversation

mustafasrepo commented Apr 16, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

mustafasrepo commented Apr 16, 2024

mustafasrepo commented Apr 16, 2024

ozankabak left a comment

Choose a reason for hiding this comment

echai58 left a comment

Choose a reason for hiding this comment

alamb commented Apr 16, 2024

korowa commented Apr 16, 2024

mustafasrepo commented Apr 17, 2024

alamb Apr 17, 2024

Choose a reason for hiding this comment

korowa commented Apr 23, 2024 • edited Loading

mustafasrepo commented Apr 24, 2024 • edited Loading

echai58 commented Apr 26, 2024

alamb commented Apr 26, 2024

korowa commented Apr 23, 2024 •

edited

Loading

mustafasrepo commented Apr 24, 2024 •

edited

Loading