Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening a tiff with scale_factor/add_offset attrs then saving as zarr and opening causes a UFuncTypeError #4784

Closed
ohiat opened this issue Jan 8, 2021 · 4 comments
Labels
topic-backends topic-CF conventions topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)

Comments

@ohiat
Copy link

ohiat commented Jan 8, 2021

What happened:
When opening a geotiff that has scale_factor and add_offset metadata and then saving it as a zarr the scale_factor and add_offset attributes are loaded and then saved as strings. When the resulting zarr is opened xarray attempts to apply the scale_factor and add_offset attributes, but raises an exception because they are of type <U32.

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/coding/variables.py in _scale_offset_decoding(data, scale_factor, add_offset, dtype)
    218     data = np.array(data, dtype=dtype, copy=True)
    219     if scale_factor is not None:
--> 220         data *= scale_factor
    221     if add_offset is not None:
    222         data += add_offset

UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('<U32') to dtype('float32') with casting rule 'same_kind'

What you expected to happen:

  1. scale_factor and add_offset are converted to floats and applied when the tiff is opened
  2. When attempting to apply scale_factor and add_offset attributes, check their types and/or cast them to floats.

Minimal Complete Verifiable Example:

import xarray as xr
img = xr.open_rasterio('https://hlssa.blob.core.windows.net/hls/S30/HLS.S30.T10TET.2019001.v1.4_04.tif')
img.to_dataset(name='img', promote_attrs=True).to_zarr('./test.zarr', mode='w')
xr.open_zarr('./test.zarr').persist()

Anything else we need to know?:

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-1034-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.2
pandas: 1.2.0
numpy: 1.19.5
scipy: 1.6.0
netCDF4: 1.5.5.1
pydap: None
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.6.1
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.8
cfgrib: None
iris: None
bottleneck: None
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.3
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: 6.2.1
IPython: 7.19.0
sphinx: None

@mathause mathause added topic-backends topic-CF conventions topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) labels Jan 11, 2021
@mathause
Copy link
Collaborator

The scale factor is already a string in img.attrs

img.attrs["scale_factor"]
# returns '0.0001'

The attrs are assigned without making any change

attrs[k] = v

So I think there is nothing going "wrong" - but maybe they should be converted anyways? I am not familiar with tif files - do they follow cf-conventions? Can they save meta only as string? Or is that wrong in your source file?

There is also

img.attrs["scales"]
(0.0001,)

attrs["scales"] = riods.scales

@ohiat
Copy link
Author

ohiat commented Jan 14, 2021

I haven't found anywhere where geotiffs are meant to follow cf-conventions and the HDF from which the geotiff was generated doesn't seem to have the scale_factor/add_offset metadata. I think the change that should be made is that when loading a zarr or anything else that is expected to follow CF conventions a type check (and possible type cast) is made before trying to apply scale_factor and add_offset

@mathause
Copy link
Collaborator

@fmaussion - do you have an opinion here?

@kmuehlbauer
Copy link
Contributor

rasterio - backend was removed in favour of deidcated backend implemented in rioxarray. See #7671. If this is still an issue it should be taken care of over in rioxarray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-backends topic-CF conventions topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

No branches or pull requests

3 participants