-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to prevent automatic rechunking? #7711
Comments
Thanks for raising an issue @TomNicholas. By default, Lines 173 to 174 in 0b2053f
This one place where rechunking is happening. To avoid this, you can specify |
It's also worth noting that the intent of the warning here is "The
operation that you're doing is causing a lot of chunks, you may want to be
aware" rather than "I've decided to rechunk your dataset for you".
Given the phrasing of your comment my guess is that you think Dask is doing
something extra / magical here. It isn't. Operations like tensordot
create a lot of chunks. If the output or intermediate arrays have way more
chunks than the input arrays then we let you know so that you can adjust
expectations.
…On Wed, May 26, 2021 at 2:46 PM James Bourbeau ***@***.***> wrote:
Thanks for raising an issue @TomNicholas <https://github.com/TomNicholas>.
By default, blockwise will attempt to align the chunks of input arrays
https://github.com/dask/dask/blob/0b2053f958a2c222184c78218a87d93f6a82b5d9/dask/array/blockwise.py#L173-L174
This is where the rechunking is happening. To avoid this, you can specify
align_arrays=False in your call to blockwise
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7711 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTF3BQATQTYR5EOEGA3TPVF2LANCNFSM45SSRMUQ>
.
|
Thanks both.
@mrocklin you're right - I completely misinterpreted the warning message to mean "[Dask is] Increasing number of chunks", rather than "[This operation will] Increase number of chunks". As a user it's not obvious to me that dask wouldn't magically do something like that, so I personally think the latter message would be a bit clearer, but I'll close this issue now. |
Also to clarify, When the input arrays have different chunk patterns, to do a chunk-by-chunk operation, you have to do some sort of re-chunking to make the chunks line up with each other. Otherwise, you'll be trying to operate between NumPy arrays of different shapes. Consider two 1D dask arrays of the same shape, but with different chunk patterns:
When you do a blockwise operation between A and B, that means you want to use block 0 of A with block 0 of B, block 1 of A with block 1 of B, etc. So you need those corresponding blocks to match in shape. What After unification, corresponding chunks of the arrays will now always be the same length, and therefore interoperable:
So—if you're going to set |
Thank you for that clarification @gjoseph92 , that's extremely helpful. It does make me wonder how my test input has even got unaligned chunks, but that's something to be discussed in xgcm/xhistogram#57 rather than here I guess. |
@gjoseph92 your comment is fantastic! I'm not sure where this should go immediately but I think we should find a space in the docs to capture those clarifying thoughts long term |
In xhistogram #57 I'm trying to test a
blockwise
-based algorithm for various chunk shapes, and finding that in my test suite dask will change my tests by automatically rechunking and issuing aPerformanceWarning
:I would prefer for dask not to override me like this - in a test suite I'm much more concerned that the tests are run exactly the way I specify than I am concerned about performance.
Is there a global option to prevent this? My
dask.config.config
dictionary looks like thisbut I'm not sure if any of the options in the configuration reference will affect this.
It's hard for me to know if my tests failing due to this or not. Some of my tests are failing, and when dask is automatically changing the test as it runs I don't really know how to debug them.
blockwise
is dispatching to code we wrote so it's plausible that the automatic rechunking is causing my test failures by switching to a chunking pattern which passes to a chunking pattern which fails.The only issue I've seen that seems related is #4763 .
The text was updated successfully, but these errors were encountered: