-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow fsspec URLs in open_(mf)dataset #4823
Merged
Merged
Changes from 18 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
4eb13ff
'Add fsspec hooks
bb4174e
run black
86e045f
fix outdated test
0582d57
nother slot
e220b3c
plumb deeper
906e920
spacing
246b171
Merge branch 'master' into fsspec_mk2
f6b4634
Update xarray/backends/zarr.py
martindurant 74b3360
import reorder
ff737b4
Separate zarr engine code
414d81c
lint: import order
f029213
Don't glob twice for fileobjs
cf9519a
Add docs
05f8e08
doc formatting
41e4402
Merge branch 'master' into fsspec_mk2
c40ce4e
Only implement for zarr
466eb40
Merge branch 'master' into fsspec_mk2
e6eb41b
Reinstate original exception
5e3b6a4
apply suggestions
c82ba9e
add aiobotocore
raybellwaves 1bcf4e2
Merge pull request #1 from raybellwaves/patch-1
martindurant c500ec0
Merge branch 'master' into fsspec_mk2
3cdf30f
try min
40f9603
rmove aoi from min
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,6 +54,7 @@ | |
requires_cfgrib, | ||
requires_cftime, | ||
requires_dask, | ||
requires_fsspec, | ||
requires_h5netcdf, | ||
requires_netCDF4, | ||
requires_pseudonetcdf, | ||
|
@@ -3040,10 +3041,17 @@ def test_open_mfdataset(self): | |
|
||
with raises_regex(IOError, "no files to open"): | ||
open_mfdataset("foo-bar-baz-*.nc") | ||
|
||
with raises_regex(ValueError, "wild-card"): | ||
open_mfdataset("http://some/remote/uri") | ||
|
||
@requires_fsspec | ||
def test_open_mfdataset_no_files(self): | ||
pytest.importorskip("aiobotocore") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this means this test is always skipped. Should this be added to some of the environments? |
||
|
||
# glob is attempted as of #4823, but finds no files | ||
with raises_regex(OSError, "no files"): | ||
open_mfdataset("http://some/remote/uri", engine="zarr") | ||
|
||
def test_open_mfdataset_2d(self): | ||
original = Dataset({"foo": (["x", "y"], np.random.randn(10, 8))}) | ||
with create_tmp_file() as tmp1: | ||
|
@@ -4799,6 +4807,48 @@ def test_extract_zarr_variable_encoding(): | |
) | ||
|
||
|
||
@requires_zarr | ||
@requires_fsspec | ||
def test_open_fsspec(): | ||
import fsspec # type: ignore | ||
keewis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
import zarr | ||
|
||
if not hasattr(zarr.storage, "FSStore") or not hasattr( | ||
zarr.storage.FSStore, "getitems" | ||
): | ||
pytest.skip("zarr too old") | ||
|
||
ds = open_dataset(os.path.join(os.path.dirname(__file__), "data", "example_1.nc")) | ||
|
||
m = fsspec.filesystem("memory") | ||
mm = m.get_mapper("out1.zarr") | ||
ds.to_zarr(mm) # old interface | ||
ds0 = ds.copy() | ||
ds0["time"] = ds.time + pd.to_timedelta("1 day") | ||
mm = m.get_mapper("out2.zarr") | ||
ds0.to_zarr(mm) # old interface | ||
|
||
# single dataset | ||
url = "memory://out2.zarr" | ||
ds2 = open_dataset(url, engine="zarr") | ||
assert ds0 == ds2 | ||
|
||
# single dataset with caching | ||
url = "simplecache::memory://out2.zarr" | ||
ds2 = open_dataset(url, engine="zarr") | ||
assert ds0 == ds2 | ||
|
||
# multi dataset | ||
url = "memory://out*.zarr" | ||
ds2 = open_mfdataset(url, engine="zarr") | ||
assert xr.concat([ds, ds0], dim="time") == ds2 | ||
|
||
# multi dataset with caching | ||
url = "simplecache::memory://out*.zarr" | ||
ds2 = open_mfdataset(url, engine="zarr") | ||
assert xr.concat([ds, ds0], dim="time") == ds2 | ||
|
||
|
||
@requires_h5netcdf | ||
def test_load_single_value_h5netcdf(tmp_path): | ||
"""Test that numeric single-element vector attributes are handled fine. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky. This assumes the backend is to want a mapper object (as the zarr backend does). But, what if the glob returns a list of netcdf files? Wouldn't we want a list of file objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is my comment for "should we actually special case zarr". It could make files - for now it would just error. We don't have tests for this, though, but now might be the time to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now tracking with the comments above, I think we have two options:
if engine=='zarr', return mapper; else return file_obj
)(1) seems to be the more reasonable thing to do here but is slightly less principled as we've been working to cleanly separate the api from the backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose a third alternative might be to pass the paths through, and create mappers in the zarr backend (will re-instantiate the FS, but that's fine) and add the opening of remote files into each of the other backends that can handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have essentially done 1), but excluded HTTP for the non-zarr path, because it has a special place for some backends (dap...). In any case, I don't suppose anyone is using globbing with http, since it's generally unreliable.