-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow in-memory arrays with open_mfdataset #5704
base: main
Are you sure you want to change the base?
Conversation
A lot of failing tests but they seem to just assume that open_mfdataset always returns dask arrays by default. Probably as simple as adding chunks={} in all these tests, but this is quite a breaking change. Do you know the reason why |
See #5689 for reference to this PR |
One way of making this less controversial is to also change the default value of Line 696 in 48a9dbe
Then the default settings will behave the same as before. Although it's still not consistent with xr.open_dataset s default parameters which mfdataset is just a thin wrapper around.
It is indeed bad practice to use dicts as default value but not completely uncommon, see for example: Line 2111 in 48a9dbe
|
The reason why We certainly could make a similar change to this, but I would not do so by default. Or I would add support for lazy concatenation into xarray's lazy indexing, and then we could slowly roll out a breaking change (with appropriate FutureWarning, etc). |
That the arrays would be loaded into memory is what you would expect if a user insists on using I just changed the default value to {}. So now it will behave as it did previously but with the possibility to load into memory for whatever reason you might have with small files. |
Those issues indeed has to be fixed if opening files lazily is the only option for xarray. But xarray could also accept that |
I just found out that `open_mfdataset` always requires dask even if `chunks=None`. This may change in the future (see pydata/xarray#5704).
I also ran into a case where I wanted to be able to opt-in to using It seems we have multiple different issues and PRs asking for the same thing here, a way to prevent breaking changes (i.e. changing the default to |
So we (@shoyer and @dcherian) discussed this in the dev meeting call just now, and I think the conclusion was that:
I'm not quite clear on what the explicit suggestion for this new kwarg would be though... Or whether it can instead be a special |
Excellent summary @TomNicholas . I think we also agreed on changing the default in the signature of |
The docstring seems to imply that it's possible to get in-memory arrays:
xarray/xarray/backends/api.py
Line 732 in 4bb9d9c
But it doesn't seem possible because of:
xarray/xarray/backends/api.py
Line 899 in 4bb9d9c
This PR removes that
or
check, changes the default tochunk={}
, and fixes the failing tests.pre-commit run --all-files
whats-new.rst