-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for dask and zarr arrays #805
base: main
Are you sure you want to change the base?
Conversation
@pllim, no hurry. |
Not sure if I'm qualified to review this one as I have never used dask nor zarr. Maybe someone from SunPy like @nabobalis is a better person to review? |
@pllim, how about just a basic code review? |
@ejeschke , sure, I'll have a look after you resolve the conflict. |
29366f5
to
b0ce57e
Compare
@pllim, @nabobalis, thank you! Yes, pleased to have any code review. You can check out this gist to see the tests I did. |
b0ce57e
to
8f6d51b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good by me. That is some tricky reshaping to be done for zarr.
I would be happy to review this, but wont be able to get to it until next week. If you could post a review request for me, it will end up on my list. :) |
@Cadair , can't add you as reviewer, but added you to assignee. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the three test modules could be merged into one to avoid code duplication via subclassing and changing a few things here and there at setup.
I wonder if a rebase will fix RTD build for this PR.
arr = arr.reshape(shape) | ||
return arr | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you guarantee that d_obj
would definitely be Dask object at this point? Or should this be an elif
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. The current code sort of assumes that by process of elimination. Probably it should be an elif
with an exception raised if it didn't match anything before. Problem is that I want to detect the cases without having to import zarr
or dask
, because this becomes the basic slicing function. So if a good duck-typing test could be done that might be a possibility...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I think the code for this is ok, we can refactor it later if a better test for dask arrays can be found.
Re: RTD failure -- we'll revisit if it persists after your next round of edits. |
feab24a
to
ea31b70
Compare
@Cadair, do you have some examples of large images (too large for RAM) that you open up using dask arrays? |
@pllim, I believe I addressed all your points. Please have another look. |
Maybe for a future PR? I want to add some more tests to the new |
ginga/tests/test_dask.py
Outdated
|
||
def _2ddata(self, shape, data_np=None): | ||
if data_np is None: | ||
data_np = np.asarray([min(i, j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be optimized, especially if you are using sizable shape
values:
In [30]: shape = (1000, 500)
In [31]: %timeit np.min(np.indices(shape), axis=0)
2.52 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [32]: %timeit np.asarray([min(i, j) for i in range(shape[0]) for j in range(shape[1])]).reshape(shape)
92.7 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
This comment also applies to all similar occurrences throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejeschke , do you not wish to address this one? Either way is fine by me, but I just want to make sure it was not overlooked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whups, ok...pushed another commit. Have a look.
ginga/tests/test_numpy.py
Outdated
|
||
def _get_data(self, shape, data_np=None): | ||
if data_np is None: | ||
data_np = np.random.randint(0, 10000, shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also remove random
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch.
Rebased, still passing all tests with latest conda installs of |
I haven't been using these two packages, so as long as tests pass and you are happy with it, FFTM. |
2f7edaa
to
c940b8b
Compare
c940b8b
to
94a87b7
Compare
c93dbcf
to
d891f13
Compare
This adds support for dask and zarr arrays into BaseImage-derived objects (e.g. AstroImage), e.g. >>> aimg = AstroImage() >>> aimg.load_data(dask_arr) These images can then be loaded directly into a Ginga viewer. Three new pytest files are added: one for numpy, dask and zarr Developer documentation has been updated.
Co-Authored-By: P. L. Lim <[email protected]>
This adds support for
dask
andzarr
arrays intoBaseImage
-derived objects (e.g.AstroImage
), e.g.These images can then be loaded directly into a Ginga viewer.
Three new pytest files are added: one for
numpy
,dask
andzarr
Developer documentation has been updated.