Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rasterio backend should use DataStorePickleMixin (or something similar) #2121

Closed
jhamman opened this issue May 11, 2018 · 2 comments
Closed

Comments

@jhamman
Copy link
Member

jhamman commented May 11, 2018

Code Sample, a copy-pastable example if possible

In [1]: import xarray as xr

In [2]: ds = xr.open_rasterio('RGB.byte.tif')

In [3]: ds
Out[3]:
<xarray.DataArray (band: 3, y: 718, x: 791)>
[1703814 values with dtype=uint8]
Coordinates:
  * band     (band) int64 1 2 3
  * y        (y) float64 2.827e+06 2.826e+06 2.826e+06 2.826e+06 2.826e+06 ...
  * x        (x) float64 1.021e+05 1.024e+05 1.027e+05 1.03e+05 1.033e+05 ...
Attributes:
    transform:   (101985.0, 300.0379266750948, 0.0, 2826915.0, 0.0, -300.0417...
    crs:         +init=epsg:32618
    res:         (300.0379266750948, 300.041782729805)
    is_tiled:    0
    nodatavals:  (0.0, 0.0, 0.0)

In [4]: import pickle

In [5]: pickle.dumps(ds)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-a165c2473431> in <module>()
----> 1 pickle.dumps(ds)

TypeError: can't pickle rasterio._io.RasterReader objects

Problem description

Originally reported by @rsignell-usgs in pangeo-data/pangeo#249 (comment), the rasterio backend is not pickle-able. This obviously causes problems when using dask-distributed. We probably need to use DataStorePickleMixin or something similar on rasterio datasets to allow multiple readers of the same dataset.

Expected Output

pickle.dumps(ds)

returns a pickled dataset.

Output of xr.show_versions()

xr.show_versions() /Users/jhamman/anaconda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.10.3
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: 0.5.1
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.2
distributed: 1.21.6
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 39.0.1
pip: 9.0.3
conda: 4.5.1
pytest: 3.5.1
IPython: 6.3.1
sphinx: 1.7.4

@rsignell-usgs
Copy link

@jhamman what kind of expertise would it take to do this job (e.g, it just a copy-and-paste with some small changes that a newbie could probably do, or would it be best for core dev team)?

And is there any workaround that can be used in the interim?

@shoyer
Copy link
Member

shoyer commented May 14, 2018

The simplest design here would be to extract the logic from DataStorePickleMixin into a pickleable wrapper class that could be used in rasterio.open().

e.g., instead of

    riods = rasterio.open(filename, mode='r')

we would write

    riods_wrapper = PickleByReconstructionWrapper(rasterio.open, filename, mode='r')

PickleByReconstructionWrapper would need to define __setstate__/__getstate__ (for pickleability), and would also have a .value attribute for pulling out the unwrapped file object.

Eventually, we should probably refactor xarray's existing data stores to use this -- the current logic is really messy/hard to understand. We'll probably also want to eventually factor out the auto-close logic in some composable way, but that can be saved for later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants