-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dask array support #44
Comments
The coarsen function might be relevant here
http://dask.pydata.org/en/latest/array-api.html#dask.array.coarsen
…On Fri, May 25, 2018, 4:45 PM Matt McCormick ***@***.***> wrote:
Stages:
1. Basic support for dask.array
2. Stream dask.array
3. Use dask.delayed or dask.futures to help downsample images.
CC: @mrocklin <https://github.com/mrocklin> @dani-lbnl
<https://github.com/dani-lbnl>
xref #43
<#43>
scisprints/2018_05_sklearn_skimage_dask#12
<scisprints/2018_05_sklearn_skimage_dask#12>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#44>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszI3FbtJ66NHWn1cuorIXbBs5G3Hlks5t2JeugaJpZM4UOsT5>
.
|
@mrocklin good suggestion regarding Some experiments are summarized here: https://gist.github.com/thewtex/079c59ea71a1da59bea5e14cb6b73d1f For large image support it looks like we should not convert NumPy arrays, ITK Images, etc. to Dask for downsampling because of the extra memory required, the time required for chunking, and the performance of downsampling. However, I will look into adding first class Dask array support by using the Dask arrays directly for large images and CC: @jakirkham |
In general I agree that the main advantage here is probably for larger-than-memory data. However, there are likely several things you can do to accelerating things on the Dask side here. For example you might pass @hmaarrfk might have some other suggestions. I'm not sure if I'm reading the notebook correctly, but it seems like you have three approaches that perform as follows. Is this right?
If this is correct then I'm curious, what does the ITK specialized code do differently than Numpy? |
It seems to me you have a 6.7 GB dataset you are working on. I feel your pain. ITK is going at about 4GB/s. You are shrinking by I've found numpy's code to be pretty slow and inefficient for large images. Slowdowns typically occur from a few places:
The reason this is important is scikit-image calls Since you are downsampling by 10, and some of the dimensions are 2048, it will pad to get that extra 2 points, and thus, copy the array twice, once for each dimension it needs to pad. Finally, the downscale local means function is using, in my mind, a very inefficient algorithm. Eventually it will calls this function See this PR scikit-image/scikit-image#3522 dask provides a Summarizing the places where I see slowdowns:
PS, that you can probably speedup your large array creating by following some of the not so much hacks suggested here: numpy/numpy#11919 though these shouldn't be so necessary in numpy 1.16. Dask seems to a similar strategy to numpy: reshape the array (free), then call mean once over the new shape
Finally, if you really want to hack, provide a dtype to |
How is dask shriking 256x256 by 10x10? Is it trimming each chunk? or just the whole? |
Yes, the prospect of eventually handling larger-than-memory with a distributed system is exciting ❗
Ah, that is good to know! Yes, adding
Right, this is a compile-time, optimized-to-the-task implementation. More details are here, PDF. |
Yes, this was a rough previous to understand if we can get interactive performance when dynamically downsampling for visualization (first implementation in #98). |
Good points, @hmaarrfk .
👍 The padding also generates "artifacts" on the border for this this use case. |
For this use case, we have to accept all data as it comes. All data is welcome :-).
Nicely done!
Good ideas, but these were unfortunately constraints from the use case. |
Stages:
dask.array
dask.array
dask.delayed
ordask.futures
to help downsample images.CC: @mrocklin @dani-lbnl
xref #43 scisprints/2018_05_sklearn_skimage_dask#12
The text was updated successfully, but these errors were encountered: