Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pass **kwargs through from save_mfdataset to to_netcdf #6684

Closed
taobrienlbl opened this issue Jun 9, 2022 · 2 comments · Fixed by #6686
Closed

pass **kwargs through from save_mfdataset to to_netcdf #6684

taobrienlbl opened this issue Jun 9, 2022 · 2 comments · Fixed by #6686

Comments

@taobrienlbl
Copy link
Contributor

Is your feature request related to a problem?

Based on the documentation of xarray.save_mfdataset, I would expect that arguments that can be passed to xarray.Dataset.to_netcdf() can also be passed to xarray.save_mfdataset:

When not using dask, it is no different than calling to_netcdf repeatedly.

But it appears that the unlimited_dims and encoding arguments available in to_netcdf are not also available in save_mfdataset:

test_save_mfdataset_encoding_opt.py:

import xarray as xr

# create a timeseries to store in a netCDF file
times = list(range(0,3652))
time = xr.DataArray(times, dims = ("time",))

# create a simple dataset to write using save_mfdataset
test_ds = xr.Dataset()
test_ds['time'] = time

# tell netCDF to write the times as doubles
encoding = dict(time = dict(dtype = "double"))

# set the output file name
output_path = "test.nc"

# the test fails when encoding is added as an argument to save_mfdataset
# but it works if instead the dataset is saved using
# test_ds.to_netcdf(output_path, encoding = encoding)
xr.save_mfdataset([test_ds], [output_path], encoding = encoding)
$ python3 test_save_mfdataset_encoding_opt.py
Traceback (most recent call last):
  File "test_save_mfdataset_encoding_opt.py", line 21, in <module>
    xr.save_mfdataset([test_ds], [output_path], encoding = encoding)
TypeError: save_mfdataset() got an unexpected keyword argument 'encoding'

This appears to be because save_mfdataset does not accept the encoding argument, nor does it accept and pass along **kwargs.

This means that datasets written with save_mfdataset are less flexible than those written with to_netcdf.

Describe the solution you'd like

A simple fix, which I have verified, is to modify save_mfdataset to accept and pass along **kwargs:

diff --git a/xarray/backends/api.py b/xarray/backends/api.py
index d1166624..8baca58c 100644
--- a/xarray/backends/api.py
+++ b/xarray/backends/api.py
@@ -1258,7 +1258,7 @@ def dump_to_store(


 def save_mfdataset(
-    datasets, paths, mode="w", format=None, groups=None, engine=None, compute=True
+    datasets, paths, mode="w", format=None, groups=None, engine=None, compute=True, **kwargs
 ):
     """Write multiple datasets to disk as netCDF files simultaneously.

@@ -1280,6 +1280,7 @@ def save_mfdataset(
         these locations will be overwritten.
     format : {"NETCDF4", "NETCDF4_CLASSIC", "NETCDF3_64BIT", \
               "NETCDF3_CLASSIC"}, optional
+    **kwargs : additional arguments are passed along to ``to_netcdf``

         File format for the resulting netCDF file:

@@ -1358,7 +1359,7 @@ def save_mfdataset(
     writers, stores = zip(
         *[
             to_netcdf(
-                ds, path, mode, format, group, engine, compute=compute, multifile=True
+                ds, path, mode, format, group, engine, compute=compute, multifile=True, **kwargs
             )
             for ds, path, group in zip(datasets, paths, groups)
         ]

When a version of xarray with xarray/backends/api.py patched as above, the test file indicated above runs as expected, with the encoding passed along:

$ python3 test_save_mfdataset_encoding_opt.py
$ ncdump -h test.nc
netcdf test {
dimensions:
	time = 3652 ;
variables:
	double time(time) ;
		time:_FillValue = NaN ;
}

Describe alternatives you've considered

I attempted to set the encoding dictionary directly on the dataset prior to calling save_mfdataset, but that didn't seem to have an effect.

Additional context

Here is version information, in case it is relevant:

$ python3 -c 'import xarray; print(xarray.show_versions())'

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.4 (default, Aug 13 2019, 15:17:50)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1

xarray: 0.15.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.6.3
netCDF4: 1.4.2
pydap: installed
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.1.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.3
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.8.3
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0
None
@TomNicholas
Copy link
Member

Thank you for raising this @taobrienlbl !

A simple fix, which I have verified, is to modify save_mfdataset to accept and pass along **kwargs:

That sounds great - would you be interested in submitting a PR with the fix? We would just need the fix + a test.

@taobrienlbl
Copy link
Contributor Author

I'm working on that at this exact moment, actually :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants