Add support for dask and zarr arrays #805

ejeschke · 2019-10-15T23:56:30Z

This adds support for dask and zarr arrays into BaseImage-derived objects (e.g. AstroImage), e.g.

>>> aimg = AstroImage()
>>> aimg.load_data(dask_arr)

These images can then be loaded directly into a Ginga viewer.

Three new pytest files are added: one for numpy, dask and zarr

Developer documentation has been updated.

ejeschke · 2019-10-16T00:00:03Z

@pllim, no hurry.

pllim · 2019-10-16T03:34:02Z

Not sure if I'm qualified to review this one as I have never used dask nor zarr. Maybe someone from SunPy like @nabobalis is a better person to review?

ejeschke · 2019-10-16T03:44:59Z

@pllim, how about just a basic code review?

pllim · 2019-10-31T14:47:08Z

@ejeschke , sure, I'll have a look after you resolve the conflict.

nabobalis · 2019-10-31T14:54:10Z

Sorry! I clearly glossed over the email notification I got from this 15 days ago.

I've not used dask myself (outside of using xarray with it). I think either @Cadair or @wtbarnes have used it in a more direct form. I am happy to provide a code review regardless.

ejeschke · 2019-10-31T20:47:14Z

@pllim, @nabobalis, thank you! Yes, pleased to have any code review.

You can check out this gist to see the tests I did.

nabobalis

It looks good by me. That is some tricky reshaping to be done for zarr.

Cadair · 2019-11-01T16:02:52Z

I would be happy to review this, but wont be able to get to it until next week. If you could post a review request for me, it will end up on my list. :)

pllim · 2019-11-01T16:21:47Z

@Cadair , can't add you as reviewer, but added you to assignee.

pllim

I feel like the three test modules could be merged into one to avoid code duplication via subclassing and changing a few things here and there at setup.

I wonder if a rebase will fix RTD build for this PR.

ginga/tests/test_dask.py

ginga/trcalc.py

pllim · 2019-11-01T18:00:19Z

ginga/trcalc.py

+            arr = arr.reshape(shape)
+            return arr
+
+        else:


Can you guarantee that d_obj would definitely be Dask object at this point? Or should this be an elif instead?

good question. The current code sort of assumes that by process of elimination. Probably it should be an elif with an exception raised if it didn't match anything before. Problem is that I want to detect the cases without having to import zarr or dask, because this becomes the basic slicing function. So if a good duck-typing test could be done that might be a possibility...

At this point I think the code for this is ok, we can refactor it later if a better test for dask arrays can be found.

pllim · 2019-11-01T20:06:16Z

Re: RTD failure -- we'll revisit if it persists after your next round of edits.

ejeschke · 2019-11-02T02:07:39Z

@Cadair, do you have some examples of large images (too large for RAM) that you open up using dask arrays?

ejeschke · 2019-11-02T02:08:22Z

@pllim, I believe I addressed all your points. Please have another look.

ejeschke · 2019-11-02T02:11:27Z

I feel like the three test modules could be merged into one to avoid code duplication via subclassing and changing a few things here and there at setup.

Maybe for a future PR? I want to add some more tests to the new test_trcalc and since those would overlap with these somewhat they could all be done together.

pllim · 2019-11-02T15:40:56Z

ginga/tests/test_dask.py

+
+    def _2ddata(self, shape, data_np=None):
+        if data_np is None:
+            data_np = np.asarray([min(i, j)


I think this can be optimized, especially if you are using sizable shape values:

In [30]: shape = (1000, 500) In [31]: %timeit np.min(np.indices(shape), axis=0) 2.52 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [32]: %timeit np.asarray([min(i, j) for i in range(shape[0]) for j in range(shape[1])]).reshape(shape) 92.7 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This comment also applies to all similar occurrences throughout.

@ejeschke , do you not wish to address this one? Either way is fine by me, but I just want to make sure it was not overlooked.

Whups, ok...pushed another commit. Have a look.

ginga/tests/test_dask.py

pllim · 2019-11-02T15:44:24Z

ginga/tests/test_numpy.py

+
+    def _get_data(self, shape, data_np=None):
+        if data_np is None:
+            data_np = np.random.randint(0, 10000, shape)


Why not also remove random here?

good catch.

ginga/tests/test_numpy.py

ejeschke · 2021-06-16T21:55:13Z

Rebased, still passing all tests with latest conda installs of zarr and dask.

pllim · 2021-06-16T21:57:17Z

I haven't been using these two packages, so as long as tests pass and you are happy with it, FFTM.

This adds support for dask and zarr arrays into BaseImage-derived objects (e.g. AstroImage), e.g. >>> aimg = AstroImage() >>> aimg.load_data(dask_arr) These images can then be loaded directly into a Ginga viewer. Three new pytest files are added: one for numpy, dask and zarr Developer documentation has been updated.

Co-Authored-By: P. L. Lim <[email protected]>

ejeschke requested a review from pllim October 15, 2019 23:59

ejeschke self-assigned this Oct 15, 2019

ejeschke added enhancement needs testing labels Oct 15, 2019

ejeschke force-pushed the dask-support branch 4 times, most recently from 29366f5 to b0ce57e Compare October 31, 2019 20:06

ejeschke force-pushed the dask-support branch from b0ce57e to 8f6d51b Compare November 1, 2019 00:49

nabobalis approved these changes Nov 1, 2019

View reviewed changes

pllim assigned Cadair Nov 1, 2019

pllim reviewed Nov 1, 2019

View reviewed changes

ejeschke mentioned this pull request Nov 1, 2019

RTD is broken #807

Closed

ejeschke force-pushed the dask-support branch from feab24a to ea31b70 Compare November 2, 2019 01:54

ejeschke added this to the 3.1 milestone Nov 2, 2019

pllim reviewed Nov 2, 2019

View reviewed changes

ginga/tests/test_dask.py Show resolved Hide resolved

pllim reviewed Nov 2, 2019

View reviewed changes

ejeschke modified the milestones: 3.2, 3.3 May 27, 2021

ejeschke force-pushed the dask-support branch from e52cef2 to 2f7edaa Compare June 16, 2021 21:43

ejeschke force-pushed the dask-support branch from 2f7edaa to c940b8b Compare December 21, 2021 00:28

ejeschke modified the milestones: 3.3, 3.4 Jan 14, 2022

ejeschke force-pushed the dask-support branch from c940b8b to 94a87b7 Compare February 15, 2022 01:37

ejeschke mentioned this pull request Aug 26, 2022

Release 4.1 #1019

Closed

2 tasks

ejeschke force-pushed the dask-support branch from c93dbcf to d891f13 Compare December 30, 2022 01:10

ejeschke and others added 15 commits May 9, 2023 11:21

fix typo in doc

6fb0027

Apply suggestions from code review

8e6012d

Co-Authored-By: P. L. Lim <[email protected]>

Incorporate changes from @pllim code review

d6576a5

Update documentation

4522b6a

Fix up tests for trcalc

5cda4d6

Fixes for robustness of array equality checks

9b6912d

Update tests for numpy as well

81343d3

Improve efficiency of test data generation

242d3f9

fix for rebase

b061db9

fix numpy deprecation warnings in tests

daeda06

fix numpy deprecation warning

2ff12ae

reverse some fixes in tests

2212935

fix typecheck tests for trcalc module

1e11861

adjustment for rebase

47b6630

ejeschke force-pushed the dask-support branch from d891f13 to 47b6630 Compare May 9, 2023 21:21

ejeschke mentioned this pull request Jun 30, 2023

Release 5.0 #1053

Closed

8 tasks

ejeschke changed the base branch from master to main July 12, 2023 22:16

ejeschke mentioned this pull request Jul 17, 2023

Implement the Python Array API standard as a client #1056

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dask and zarr arrays #805

Add support for dask and zarr arrays #805

ejeschke commented Oct 15, 2019 •

edited

Loading

ejeschke commented Oct 16, 2019

pllim commented Oct 16, 2019 •

edited

Loading

ejeschke commented Oct 16, 2019

pllim commented Oct 31, 2019

nabobalis commented Oct 31, 2019

ejeschke commented Oct 31, 2019

nabobalis left a comment

Cadair commented Nov 1, 2019

pllim commented Nov 1, 2019

pllim left a comment

pllim Nov 1, 2019

ejeschke Nov 1, 2019

ejeschke Nov 2, 2019

pllim commented Nov 1, 2019

ejeschke commented Nov 2, 2019

ejeschke commented Nov 2, 2019

ejeschke commented Nov 2, 2019

pllim Nov 2, 2019 •

edited

Loading

pllim Nov 4, 2019

ejeschke Nov 4, 2019

pllim Nov 2, 2019

ejeschke Nov 4, 2019

ejeschke commented Jun 16, 2021

pllim commented Jun 16, 2021

Add support for dask and zarr arrays #805

Are you sure you want to change the base?

Add support for dask and zarr arrays #805

Conversation

ejeschke commented Oct 15, 2019 • edited Loading

ejeschke commented Oct 16, 2019

pllim commented Oct 16, 2019 • edited Loading

ejeschke commented Oct 16, 2019

pllim commented Oct 31, 2019

nabobalis commented Oct 31, 2019

ejeschke commented Oct 31, 2019

nabobalis left a comment

Choose a reason for hiding this comment

Cadair commented Nov 1, 2019

pllim commented Nov 1, 2019

pllim left a comment

Choose a reason for hiding this comment

pllim Nov 1, 2019

Choose a reason for hiding this comment

ejeschke Nov 1, 2019

Choose a reason for hiding this comment

ejeschke Nov 2, 2019

Choose a reason for hiding this comment

pllim commented Nov 1, 2019

ejeschke commented Nov 2, 2019

ejeschke commented Nov 2, 2019

ejeschke commented Nov 2, 2019

pllim Nov 2, 2019 • edited Loading

Choose a reason for hiding this comment

pllim Nov 4, 2019

Choose a reason for hiding this comment

ejeschke Nov 4, 2019

Choose a reason for hiding this comment

pllim Nov 2, 2019

Choose a reason for hiding this comment

ejeschke Nov 4, 2019

Choose a reason for hiding this comment

ejeschke commented Jun 16, 2021

pllim commented Jun 16, 2021

ejeschke commented Oct 15, 2019 •

edited

Loading

pllim commented Oct 16, 2019 •

edited

Loading

pllim Nov 2, 2019 •

edited

Loading