Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add storage_options arg to to_zarr #5615

Merged
merged 13 commits into from
Aug 21, 2021
14 changes: 12 additions & 2 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1310,6 +1310,7 @@ def to_zarr(
dataset: Dataset,
store: Union[MutableMapping, str, Path] = None,
chunk_store=None,
storage_options: Dict[str, str] = None,
max-sixty marked this conversation as resolved.
Show resolved Hide resolved
mode: str = None,
synchronizer=None,
group: str = None,
Expand All @@ -1330,6 +1331,15 @@ def to_zarr(
store = _normalize_path(store)
chunk_store = _normalize_path(chunk_store)

if storage_options is None:
mapper = store
chunk_mapper = chunk_store
else:
from fsspec import get_mapper

mapper = get_mapper(store, **storage_options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we import fsspec here and raise an ValueError if fsspec is not available?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious what behavior we should expect if a store (e.g. GCSMapper) is provided along with storage_options. We should probably check that store is a valid fsspec target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we import fsspec here and raise an ValueError if fsspec is not available?

Did your suggestion. As it stands if fsspec is not available it'll raise a ModuleNotFoundError: No module named 'fsspec'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fsspec is such a small dep, with no deps of its own, you may wish to add it to xarray's deps for simplicity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious what behavior we should expect if a store (e.g. GCSMapper) is provided along with storage_options. We should probably check that store is a valid fsspec target.

Not sure. I tend to just use a string e.g. gs://path/file.zarr for store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the input to get_mapper must be a string. So I think a useful check here would that if storage_options is provided, store must be a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, every implementation takes a string for the "root"; it is not completely inconceivable that some implementation might take something else, but I can't immediately imagine it.

chunk_mapper = get_mapper(chunk_store, **storage_options)

if encoding is None:
encoding = {}

Expand Down Expand Up @@ -1372,13 +1382,13 @@ def to_zarr(
already_consolidated = False
consolidate_on_close = consolidated or consolidated is None
zstore = backends.ZarrStore.open_group(
store=store,
store=mapper,
mode=mode,
synchronizer=synchronizer,
group=group,
consolidated=already_consolidated,
consolidate_on_close=consolidate_on_close,
chunk_store=chunk_store,
chunk_store=chunk_mapper,
append_dim=append_dim,
write_region=region,
safe_chunks=safe_chunks,
Expand Down
11 changes: 8 additions & 3 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1914,6 +1914,7 @@ def to_zarr(
self,
store: Union[MutableMapping, str, Path] = None,
chunk_store: Union[MutableMapping, str, Path] = None,
storage_options: Dict[str, str] = None,
mode: str = None,
synchronizer=None,
group: str = None,
Expand Down Expand Up @@ -1942,10 +1943,13 @@ def to_zarr(
Parameters
----------
store : MutableMapping, str or Path, optional
Store or path to directory in file system.
Store or path to directory in local or remote file system.
chunk_store : MutableMapping, str or Path, optional
Store or path to directory in file system only for Zarr array chunks.
Requires zarr-python v2.4.0 or later.
Store or path to directory in local or remote file system only for Zarr
array chunks. Requires zarr-python v2.4.0 or later.
storage_options : dict, optional
Any additional parameters for the storage backend (ignored for local
paths).
mode : {"w", "w-", "a", "r+", None}, optional
Persistence mode: "w" means create (overwrite if exists);
"w-" means create (fail if exists);
Expand Down Expand Up @@ -2032,6 +2036,7 @@ def to_zarr(
self,
store=store,
chunk_store=chunk_store,
storage_options=storage_options,
mode=mode,
synchronizer=synchronizer,
group=group,
Expand Down