You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally reported by @rsignell-usgs in pangeo-data/pangeo#249 (comment), the rasterio backend is not pickle-able. This obviously causes problems when using dask-distributed. We probably need to use DataStorePickleMixin or something similar on rasterio datasets to allow multiple readers of the same dataset.
Expected Output
pickle.dumps(ds)
returns a pickled dataset.
Output of xr.show_versions()
xr.show_versions()
/Users/jhamman/anaconda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
@jhamman what kind of expertise would it take to do this job (e.g, it just a copy-and-paste with some small changes that a newbie could probably do, or would it be best for core dev team)?
And is there any workaround that can be used in the interim?
The simplest design here would be to extract the logic from DataStorePickleMixin into a pickleable wrapper class that could be used in rasterio.open().
PickleByReconstructionWrapper would need to define __setstate__/__getstate__ (for pickleability), and would also have a .value attribute for pulling out the unwrapped file object.
Eventually, we should probably refactor xarray's existing data stores to use this -- the current logic is really messy/hard to understand. We'll probably also want to eventually factor out the auto-close logic in some composable way, but that can be saved for later.
Code Sample, a copy-pastable example if possible
Problem description
Originally reported by @rsignell-usgs in pangeo-data/pangeo#249 (comment), the rasterio backend is not pickle-able. This obviously causes problems when using dask-distributed. We probably need to use
DataStorePickleMixin
or something similar on rasterio datasets to allow multiple readers of the same dataset.Expected Output
returns a pickled dataset.
Output of
xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.3
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: 0.5.1
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.2
distributed: 1.21.6
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 39.0.1
pip: 9.0.3
conda: 4.5.1
pytest: 3.5.1
IPython: 6.3.1
sphinx: 1.7.4
The text was updated successfully, but these errors were encountered: