-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse arrays #165
Comments
Sorry for the late reply @jvail ! Yes it would be nice if xarray-simlab could support sparse arrays (I haven't tried yet). Xarray-simlab uses |
No need to be sorry! Alright - Thank you. I'll try to read the zarr related stuff and try a dev install of simlab to see how far I can get. |
Hi @benbovy , here is a test for some minimal sparse support: jvail@107d2e0#diff-932f643c875a75e89dbf205d2764a1ac25b260f5c12503543242f833e8359a62 It turns out it is impossible to use sparse with intent='inout' because zarr wont accept it and if you auto densify you are back at square one because you would have to "re-sparse" after reading the input. It works well internally - meaning inside a process without having the variable either in input_vars or output_vars - without any changes in the source code. If the sparse variable is a model output (in output_vars) you would have to call todense() on it (see my commit). Not sure if you want that. But at least it would provide a bit of support. Another way to handle sparse in outputs could be to just drop all sparse when writing to zarr and issue a warning. |
Hi @jvail, Thanks for your example. Calling Hopefully, better support for sparse arrays (and maybe other array backends) will be fixed (in zarr and xarray) upstream in the future. In the meantime, maybe we could implement some ad-hoc conversion in xarray-simlab, i.e., for sparse arrays having |
Thank you Benoit. That sounds like a pretty good idea. Should work with 'inout' as well - I'll give it a try. |
Salut @benbovy, if you find time please take a look at this commit/branch: jvail@cfa37a6 Instead of adding two zarr items - as you suggested - I tried to use the DOK format which nicely translates into a structured numpy array. That in turn is digestible by zarr. Unfortunately I could not get it working with the "infamous" There is a little notebook as well. |
Without having looked at your branch yet, my first idea would be to fix that in Xarray :-) ! |
hmpf - that is beyond my reach. But maybe we can can get around it by injecting some encoding/decoding options - I hope. |
I just had a look at your jvail@cfa37a6 branch. That's nice! I think that such support for sparse arrays would be a nice addition in xarray-simlab before proper support for sparse is available in zarr (and zarr->xarray). A comment about your implementation: the approach that you use won't scale well if later we want to support other special cases. The VariableCoder approach used in Xarray would nicely fit here, even for one special case. Unfortunately it is not part (yet?) of the public API, but we could reuse the same approach here. |
Yes, that's true. I'll try to propose something more general. But it seems you are d'accord with this approach - generally speaking? Do you have an idea how to fix the |
Yes generally I'm d'accord ! Do you have an idea how to fix the mask_and_scale: True issue? Could it be that now xarray does complain about structured arrays coming from zarr? If so, I might be back at square one :\ I'm afraid there's currently no workaround other than |
Apparently setting the zarr fill_value to |
It turned out that we do not desperately need sparse arrays right now. With our current model/data we stay below a "critical" size (wherever that is) with our indices. But if they grow much larger in future and therefore arrays get sparser and sparser support for sparse would be very welcome. |
👍 for the new release.
I was wondering if it would be laborious to support sparse (https://github.com/pydata/sparse) arrays. I have an issue with this line
https://github.com/benbovy/xarray-simlab/blob/master/xsimlab/stores.py#L193
dtype = getattr(value, "dtype", np.asarray(value).dtype)
I guess in case of sparse it would just be:
dtype = getattr(value, "dtype", value.dtype)
But maybe after that line there are other issues as well. I just wanted to ask asap if you think sparse could be easily supported with a few tweaks. Because if not then I might just have to skip the idea of using sparse arrays at all.
Thank you,
Jan
The text was updated successfully, but these errors were encountered: