Add dask array support #44

thewtex · 2018-05-25T23:45:49Z

Stages:

Basic support for dask.array
Stream dask.array
Use dask.delayed or dask.futures to help downsample images.

CC: @mrocklin @dani-lbnl
xref #43 scisprints/2018_05_sklearn_skimage_dask#12

The text was updated successfully, but these errors were encountered:

mrocklin · 2018-05-26T00:24:06Z

The coarsen function might be relevant here http://dask.pydata.org/en/latest/array-api.html#dask.array.coarsen

On Fri, May 25, 2018, 4:45 PM Matt McCormick ***@***.***> wrote: Stages: 1. Basic support for dask.array 2. Stream dask.array 3. Use dask.delayed or dask.futures to help downsample images. CC: @mrocklin <https://github.com/mrocklin> @dani-lbnl <https://github.com/dani-lbnl> xref #43 <#43> scisprints/2018_05_sklearn_skimage_dask#12 <scisprints/2018_05_sklearn_skimage_dask#12> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszI3FbtJ66NHWn1cuorIXbBs5G3Hlks5t2JeugaJpZM4UOsT5> .

thewtex · 2018-10-29T04:57:31Z

@mrocklin good suggestion regarding coarsen! 👍

Some experiments are summarized here:

https://gist.github.com/thewtex/079c59ea71a1da59bea5e14cb6b73d1f

For large image support it looks like we should not convert NumPy arrays, ITK Images, etc. to Dask for downsampling because of the extra memory required, the time required for chunking, and the performance of downsampling. However, I will look into adding first class Dask array support by using the Dask arrays directly for large images and coarsen. This will remove the extra memory and time required to allocate the memory.

CC: @jakirkham

mrocklin · 2018-10-29T11:12:16Z

In general I agree that the main advantage here is probably for larger-than-memory data. However, there are likely several things you can do to accelerating things on the Dask side here. For example you might pass name=False to from_array to avoid that cost (it's hashing the data to find a deterministic key that you don't care about).

@hmaarrfk might have some other suggestions.

I'm not sure if I'm reading the notebook correctly, but it seems like you have three approaches that perform as follows. Is this right?

Numpy 40s
Dask array 3s
Some ITK specialized code 1.5s

If this is correct then I'm curious, what does the ITK specialized code do differently than Numpy?

hmaarrfk · 2018-10-29T12:46:39Z

It seems to me you have a 6.7 GB dataset you are working on. I feel your pain. ITK is going at about 4GB/s. You are shrinking by 10, 10, 10 is that correct?

I've found numpy's code to be pretty slow and inefficient for large images. Slowdowns typically occur from a few places:

Generous calls to np.ascontiguousarray which do wonders for small data (copying it is almost free) but which are very expensive for larger nd arrays.
np.pad and np.block were optimized for small arrays where repeated concatenations is actually must faster than building up the metadata required for single copy padding/blocking. Please chime in here MAINT: Rewrite numpy.pad without concatenate numpy/numpy#11358 to express support as some believe that this is a "micro optimization". As a note, an np.block algorithm for large arrays just got in, so the performance of that one should be much better in 1.16 for arrays larger than 512x512.

The reason this is important is scikit-image calls np.pad to pad the array before downscale_local_means. @lagru seems to have wrtiten _fast_pad for scikit-image which implements the algorithm above, but only for pad widths of 1. I would support any PR that extends that functionality for width>1, but I would much prefer numpy actually update their algorithm with @lagru's work.

Since you are downsampling by 10, and some of the dimensions are 2048, it will pad to get that extra 2 points, and thus, copy the array twice, once for each dimension it needs to pad.

Finally, the downscale local means function is using, in my mind, a very inefficient algorithm.

Eventually it will calls this function
https://github.com/scikit-image/scikit-image/blob/972ce191ba6361356de17274bcf7ef2bb4d989e8/skimage/measure/block.py#L77
and for your 3D image, create a 6D image where the strides are arranged in such a way to have the 0:3 dims be equal to your final dimension, and 3:6 be what you want to average over. But, it will make 3 calls to np.mean instead of 1, and thus, allocate 3 large arrays in the process. I'm not even sure if np.mean supports non-contiguous arrays. You better hope it doesn't copy. It really might, so you have to check.

See this PR scikit-image/scikit-image#3522
this fails the median test, which was almost surely wrong. Help writing a benchmark would be appreciated!

dask provides a trim_excess parameter that seems like an interesting way to deal with this excess that can probably be integrated in scikit-image.

Summarizing the places where I see slowdowns:

you are downsampling 2048, 2048, 1600 by 10, 10, 10 which causes the array to get padded. The input array should be contiguous and its shape should be divisible by the shrink factors.
Try my dev branch of skimage.
If you are on linux, try numpy dev branch to make use of kernel optimizations for large memory allocs.

PS, that you can probably speedup your large array creating by following some of the not so much hacks suggested here: numpy/numpy#11919 though these shouldn't be so necessary in numpy 1.16.

Dask seems to a similar strategy to numpy: reshape the array (free), then call mean once over the new shape
Your dask code could be optimized in a few places:

I got hung up on the name parameter too. See discussion: from_array performance when no name is provided dask/dask#3946
use persist. It is unlikely that you care about creating a final contiguous matrix (unless you do).
Remove the call to astype. This is pretty contentious in scikit-image. I've tried to see if using smaller dtypes would help, it just depends on how the compiler can optimize them. It isn't obvious that Cython is the right tool to ensure that fast operations occur on uint8.

Finally, if you really want to hack, provide a dtype to coarsen and to local_means and feed in float32.

hmaarrfk · 2018-10-29T12:47:33Z

How is dask shriking 256x256 by 10x10? Is it trimming each chunk? or just the whole?

thewtex · 2018-10-30T04:39:03Z

In general I agree that the main advantage here is probably for larger-than-memory data.

Yes, the prospect of eventually handling larger-than-memory with a distributed system is exciting ❗

However, there are likely several things you can do to accelerating things on the Dask side here. For example you might pass name=False to from_array to avoid that cost (it's hashing the data to find a deterministic key that you don't care about).

Ah, that is good to know! Yes, adding name=False makes the distribution in a Dask Array much, much faster (10 sec to 4.2 ms).

If this is correct then I'm curious, what does the ITK specialized code do differently than Numpy?

Right, this is a compile-time, optimized-to-the-task implementation. More details are here, PDF.

thewtex · 2018-10-30T04:41:13Z

It seems to me you have a 6.7 GB dataset you are working on. I feel your pain. ITK is going at about 4GB/s. You are shrinking by 10, 10, 10 is that correct?

Yes, this was a rough previous to understand if we can get interactive performance when dynamically downsampling for visualization (first implementation in #98).

thewtex · 2018-10-30T04:52:54Z

The reason this is important is scikit-image calls np.pad to pad the array before downscale_local_means. @lagru seems to have wrtiten _fast_pad for scikit-image which implements the algorithm above, but only for pad widths of 1. I would support any PR that extends that functionality for width>1, but I would much prefer numpy actually update their algorithm with @lagru's work.

Since you are downsampling by 10, and some of the dimensions are 2048, it will pad to get that extra 2 points, and thus, copy the array twice, once for each dimension it needs to pad.

Good points, @hmaarrfk .

dask provides a trim_excess parameter that seems like an interesting way to deal with this excess that can probably be integrated in scikit-image.

👍

The padding also generates "artifacts" on the border for this this use case.

thewtex · 2018-10-30T05:00:35Z

you are downsampling 2048, 2048, 1600 by 10, 10, 10 which causes the array to get padded. The input array should be contiguous and its shape should be divisible by the shrink factors.

For this use case, we have to accept all data as it comes. All data is welcome :-).

PS, that you can probably speedup your large array creating by following some of the not so much hacks suggested here: numpy/numpy#11919 though these shouldn't be so necessary in numpy 1.16.

Nicely done!

2\. use `persist`. It is unlikely that you care about creating a final contiguous matrix (unless you do).

3\. Remove the call to `astype`. This is pretty contentious in scikit-image. I've tried to see if using smaller dtypes would help, it just depends on how the compiler can optimize them. It isn't obvious that Cython is the right tool to ensure that fast operations occur on uint8.

Finally, if you really want to hack, provide a dtype to coarsen and to local_means and feed in float32.

Good ideas, but these were unfortunately constraints from the use case.

Thanks for the reviews @hmaarrfk @mrocklin !

thewtex mentioned this issue May 25, 2018

Interactive Image Viewer in the Jupyter Notebook scisprints/2018_05_sklearn_skimage_dask#12

Open

thewtex mentioned this issue May 29, 2018

ENH: Add Dask Array basic support #46

Merged

thewtex added the enhancement New feature or request label Oct 29, 2018

thewtex mentioned this issue Nov 19, 2018

DaskArray example yields corrupt data on initial execution #101

Closed

thewtex mentioned this issue Jul 31, 2019

Blogpost using Dask Array with ITK dask/dask-blog#27

Closed

thewtex added this to the itkwidgets 0.33 Alpha 1 milestone Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dask array support #44

Add dask array support #44

thewtex commented May 25, 2018 •

edited

Loading

mrocklin commented May 26, 2018 via email

thewtex commented Oct 29, 2018

mrocklin commented Oct 29, 2018

hmaarrfk commented Oct 29, 2018 •

edited

Loading

hmaarrfk commented Oct 29, 2018

thewtex commented Oct 30, 2018

thewtex commented Oct 30, 2018

thewtex commented Oct 30, 2018

thewtex commented Oct 30, 2018

Add dask array support #44

Add dask array support #44

Comments

thewtex commented May 25, 2018 • edited Loading

mrocklin commented May 26, 2018 via email

thewtex commented Oct 29, 2018

mrocklin commented Oct 29, 2018

hmaarrfk commented Oct 29, 2018 • edited Loading

hmaarrfk commented Oct 29, 2018

thewtex commented Oct 30, 2018

thewtex commented Oct 30, 2018

thewtex commented Oct 30, 2018

thewtex commented Oct 30, 2018

thewtex commented May 25, 2018 •

edited

Loading

hmaarrfk commented Oct 29, 2018 •

edited

Loading