Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xarray does not support full range of netcdf-python compression options #7388

Closed
rabernat opened this issue Dec 19, 2022 · 22 comments · Fixed by #7551
Closed

Xarray does not support full range of netcdf-python compression options #7388

rabernat opened this issue Dec 19, 2022 · 22 comments · Fixed by #7551

Comments

@rabernat
Copy link
Contributor

rabernat commented Dec 19, 2022

What is your issue?

Summary

The netcdf4-python API docs say the following

If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib and szip use the HDF5 plugin architecture.

If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor of compression='zlib'.

Although compression is considered a valid encoding option by Xarray

valid_encodings = {
"zlib",
"complevel",
"fletcher32",
"contiguous",
"chunksizes",
"shuffle",
"_FillValue",
"dtype",
"compression",
}

...it appears that we silently ignores the compression option when creating new netCDF4 variables:

nc4_var = self.ds.createVariable(
varname=name,
datatype=datatype,
dimensions=variable.dims,
zlib=encoding.get("zlib", False),
complevel=encoding.get("complevel", 4),
shuffle=encoding.get("shuffle", True),
fletcher32=encoding.get("fletcher32", False),
contiguous=encoding.get("contiguous", False),
chunksizes=encoding.get("chunksizes"),
endian="native",
least_significant_digit=encoding.get("least_significant_digit"),
fill_value=fill_value,
)

Code example

shape = (10, 20)
chunksizes = (1, 10)

encoding = {
    'compression': 'zlib',
    'shuffle': True,
    'complevel': 8,
    'fletcher32': False,
    'contiguous': False,
    'chunksizes': chunksizes
}

da = xr.DataArray(
    data=np.random.rand(*shape),
    dims=['y', 'x'],
    name="foo",
    attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()

fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")

with xr.open_dataset(fname, engine="netcdf4") as ds1:
    display(ds1.foo.encoding)
{'zlib': False,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': False,
 'complevel': 0,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (1, 10),
 'source': 'test.nc',
 'original_shape': (10, 20),
 'dtype': dtype('float64'),
 '_FillValue': nan}

In addition to showing that compression is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip, zstd, bzip2, blosc).

Proposal

We should align with the recommendation from the netcdf4 docs and support compression= style encoding in NetCDF. We should deprecate zlib=True syntax.

@rabernat rabernat added the needs triage Issue that has not been reviewed by xarray team member label Dec 19, 2022
@dcherian
Copy link
Contributor

dcherian commented Jan 15, 2023

Ouch my bad. I think the existing test only makes sure it's passed through in encoding but not that it is actually written to disk

cc @markelg this would be a nice follow-on PR :)

@dcherian dcherian added bug topic-backends and removed needs triage Issue that has not been reviewed by xarray team member labels Jan 15, 2023
@markelg
Copy link
Contributor

markelg commented Jan 16, 2023

I'll have a look. Sorry about that, I guess we assumed that encoding was passed with "**kwargs". I did not try it with netcdf-c 4.9.x since is it not yet available in conda-forge and I did not find the time to compile it.

@markelg
Copy link
Contributor

markelg commented Feb 23, 2023

With the PR the test above works, and also bzip2. I can't get it to apply blosc filters for some reason, it works but it does not really apply them. This is the full snippet I am using:

import xarray as xr
import numpy as np

shape = (10, 20)
chunksizes = (1, 10)

encoding = {
    'compression': 'bzip2',
    'shuffle': True,
    'complevel': 8,
    'fletcher32': False,
    'contiguous': False,
    'chunksizes': chunksizes
}

da = xr.DataArray(
    data=np.random.rand(*shape),
    dims=['y', 'x'],
    name="foo",
    attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()

fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")

with xr.open_dataset(fname, engine="netcdf4") as ds1:
    print(ds1.foo.encoding)

Also, I was not able to make the conda environment in ci/environment.yml to resolve libnetcdf 4.9.1. I had to build an environment on my own. I also added the hdf5 filters

name: xarray-tests
channels:
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - blosc=1.21.3=hafa529b_0
  - blosc-hdf5-plugin=1.0.0=h8b9aba8_4
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2022.12.7=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - cftime=1.6.2=py311h4c7f6c3_1
  - curl=7.88.1=hdc1c0ab_0
  - h5py=3.8.0=nompi_py311h1db17ec_100
  - hdf4=4.2.15=h9772cbc_5
  - hdf5=1.12.2=nompi_h4df4325_101
  - hdf5-external-filter-plugins=0.1.0=ha770c72_9
  - hdf5-external-filter-plugins-bitshuffle=0.1.0=h6ca952b_9
  - hdf5-external-filter-plugins-bzip2=0.1.0=hd13e76c_9
  - hdf5-external-filter-plugins-lz4=0.1.0=h6ca952b_9
  - hdf5plugin=4.1.1=py311hc7375e3_0
  - icu=70.1=h27087fc_0
  - jpeg=9e=h0b41bf4_3
  - keyutils=1.6.1=h166bdaf_0
  - krb5=1.20.1=h81ceb04_0
  - ld_impl_linux-64=2.40=h41732ed_0
  - libaec=1.0.6=hcb278e6_1
  - libblas=3.9.0=16_linux64_openblas
  - libcblas=3.9.0=16_linux64_openblas
  - libcurl=7.88.1=hdc1c0ab_0
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=12.2.0=h65d4601_19
  - libgfortran-ng=12.2.0=h69a702a_19
  - libgfortran5=12.2.0=h337968e_19
  - libgomp=12.2.0=h65d4601_19
  - libiconv=1.17=h166bdaf_0
  - liblapack=3.9.0=16_linux64_openblas
  - libnetcdf=4.9.1=nompi_h34a3ff0_100
  - libnghttp2=1.51.0=hff17c54_0
  - libnsl=2.0.0=h7f98852_0
  - libopenblas=0.3.21=pthreads_h78a6416_3
  - libsqlite=3.40.0=h753d276_0
  - libssh2=1.10.0=hf14f497_3
  - libstdcxx-ng=12.2.0=h46fd767_19
  - libuuid=2.32.1=h7f98852_1000
  - libxml2=2.10.3=h7463322_0
  - libzip=1.9.2=hc929e4a_1
  - libzlib=1.2.13=h166bdaf_4
  - lz4-c=1.9.4=hcb278e6_0
  - ncurses=6.3=h27087fc_1
  - netcdf4=1.6.2=nompi_py311ha396515_101
  - numpy=1.24.2=py311h8e6699e_0
  - openssl=3.0.8=h0b41bf4_0
  - pip=23.0.1=pyhd8ed1ab_0
  - python=3.11.0=he550d4f_1_cpython
  - python_abi=3.11=3_cp311
  - readline=8.1.2=h0f457ee_0
  - setuptools=67.4.0=pyhd8ed1ab_0
  - snappy=1.1.9=hbd366e4_2
  - tk=8.6.12=h27826a3_0
  - tzdata=2022g=h191b570_0
  - wheel=0.38.4=pyhd8ed1ab_0
  - xz=5.2.6=h166bdaf_0
  - zlib=1.2.13=h166bdaf_4
  - zstd=1.5.2=h3eb15da_6
  - pip:
    - packaging==23.0
    - pandas==1.5.3
    - python-dateutil==2.8.2
    - pytz==2022.7.1
    - six==1.16.0
    - xarray==0.1.dev4485+gf8a0014

@markelg
Copy link
Contributor

markelg commented Apr 11, 2023

Hi. I updated the branch and created a fresh python environment with the idea of writing another, final test for this.
However before doing it I run the test suite, and got some bad HDF5 errors in test_backends.py::test_open_mfdataset_manyfiles[netcdf4-20-True-None-5]

  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitGroomNumberOfSignificantDigits'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeGranularBitRoundNumberOfSignificantDigits'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitRoundNumberOfSignificantBits'
    major: Attribute
    minor: Object not found

I am not sure what is going on. It seems that the currently resolved netcdf4-hdf5 versions do not like the default parameters we are supplying. My environment is

#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
affine                    2.4.0              pyhd8ed1ab_0    conda-forge
aiobotocore               2.5.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.4           py310h1fa729e_0    conda-forge
aioitertools              0.11.0             pyhd8ed1ab_0    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
antlr-python-runtime      4.7.2           py310hff52083_1003    conda-forge
asciitree                 0.3.3                      py_2    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
backports.zoneinfo        0.2.1           py310hff52083_7    conda-forge
beautifulsoup4            4.12.2             pyha770c72_0    conda-forge
blosc                     1.21.3               hafa529b_0    conda-forge
boost-cpp                 1.78.0               h5adbc97_2    conda-forge
boto3                     1.26.76            pyhd8ed1ab_0    conda-forge
botocore                  1.29.76            pyhd8ed1ab_0    conda-forge
bottleneck                1.3.7           py310h0a54255_0    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cairo                     1.16.0            ha61ee94_1014    conda-forge
cartopy                   0.21.1          py310hcb7e713_0    conda-forge
cdat_info                 8.2.1              pyhd8ed1ab_2    conda-forge
cdms2                     3.1.5           py310hb9168da_16    conda-forge
cdtime                    3.1.4           py310h87e304a_8    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cf-units                  3.1.1           py310hde88566_2    conda-forge
cffi                      1.15.1          py310h255011f_3    conda-forge
cfgrib                    0.9.10.3           pyhd8ed1ab_0    conda-forge
cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
cfitsio                   4.2.0                hd9d235c_0    conda-forge
cftime                    1.6.2           py310hde88566_1    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.7.2              pyhd8ed1ab_1    conda-forge
cloudpickle               2.2.1              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.7           py310hdf3cbec_0    conda-forge
coverage                  7.2.3           py310h1fa729e_0    conda-forge
cryptography              40.0.1          py310h34c0648_0    conda-forge
curl                      7.88.1               hdc1c0ab_1    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
cytoolz                   0.12.0          py310h5764c6d_1    conda-forge
dask-core                 2023.3.2           pyhd8ed1ab_0    conda-forge
distarray                 2.12.2             pyh050c7b8_4    conda-forge
distlib                   0.3.6              pyhd8ed1ab_0    conda-forge
distributed               2023.3.2.1         pyhd8ed1ab_0    conda-forge
docopt                    0.6.2                      py_1    conda-forge
eccodes                   2.29.0               h54fcba4_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
esmf                      8.4.1           nompi_he2e5181_0    conda-forge
esmpy                     8.4.1              pyhc1e730c_0    conda-forge
exceptiongroup            1.1.1              pyhd8ed1ab_0    conda-forge
execnet                   1.9.0              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
fasteners                 0.17.3             pyhd8ed1ab_0    conda-forge
filelock                  3.11.0             pyhd8ed1ab_0    conda-forge
findlibs                  0.0.2              pyhd8ed1ab_0    conda-forge
flox                      0.6.10             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.39.3          py310h1fa729e_0    conda-forge
freeglut                  3.2.2                h9c3ff4c_1    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
freexl                    1.0.6                h166bdaf_1    conda-forge
frozenlist                1.3.3           py310h5764c6d_0    conda-forge
fsspec                    2023.4.0           pyh1a96a4e_0    conda-forge
future                    0.18.3             pyhd8ed1ab_0    conda-forge
g2clib                    1.6.3                hbecde78_1    conda-forge
geos                      3.11.1               h27087fc_0    conda-forge
geotiff                   1.7.1                h7a142b4_6    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
h5netcdf                  1.1.0              pyhd8ed1ab_1    conda-forge
h5py                      3.8.0           nompi_py310h0311031_100    conda-forge
hdf4                      4.2.15               h9772cbc_5    conda-forge
hdf5                      1.12.2          nompi_h4df4325_101    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
hypothesis                6.71.0             pyha770c72_0    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
identify                  2.5.22             pyhd8ed1ab_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.3.0              pyha770c72_0    conda-forge
importlib_metadata        6.3.0                hd8ed1ab_0    conda-forge
importlib_resources       5.12.0             pyhd8ed1ab_0    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
iris                      3.4.1              pyhd8ed1ab_0    conda-forge
jasper                    2.0.33               h0ff4b12_1    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jmespath                  1.0.1              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h0b41bf4_3    conda-forge
json-c                    0.16                 hc379101_0    conda-forge
jsonschema                4.17.3             pyhd8ed1ab_0    conda-forge
jupyter_core              5.3.0           py310hff52083_0    conda-forge
kealib                    1.5.0                ha7026e8_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4           py310hbf28c38_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lazy-object-proxy         1.9.0           py310h1fa729e_0    conda-forge
lcms2                     2.15                 hfd0df8a_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcdms                   3.1.2              h9366c0b_120    conda-forge
libcf                     1.0.3           py310h71500c5_116    conda-forge
libcurl                   7.88.1               hdc1c0ab_1    conda-forge
libdeflate                1.17                 h0b41bf4_0    conda-forge
libdrs                    3.1.2              h01ed8d5_119    conda-forge
libdrs_f                  3.1.2              h059c5b8_115    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgdal                   3.6.2                h6c674c2_9    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libglib                   2.74.1               h606061b_1    conda-forge
libglu                    9.0.0             he1b5a44_1001    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libkml                    1.3.0             h37653c0_1015    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libllvm11                 11.1.0               he0ac6c6_5    conda-forge
libnetcdf                 4.9.1           nompi_h34a3ff0_101    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libpq                     15.2                 hb675445_0    conda-forge
librttopo                 1.1.0               ha49c73b_12    conda-forge
libspatialite             5.0.1               h221c8f1_23    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.0                h0b41bf4_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxml2                   2.10.3               hca2bb57_4    conda-forge
libxslt                   1.1.37               h873f0b0_0    conda-forge
libzip                    1.9.2                hc929e4a_1    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvmlite                  0.39.1          py310h58363a5_1    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lxml                      4.9.2           py310hbdc0903_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
markupsafe                2.1.2           py310h1fa729e_0    conda-forge
matplotlib-base           3.7.1           py310he60537e_0    conda-forge
msgpack-python            1.0.5           py310hdf3cbec_0    conda-forge
multidict                 6.0.4           py310h1fa729e_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
nbformat                  5.8.0              pyhd8ed1ab_0    conda-forge
nc-time-axis              1.4.1              pyhd8ed1ab_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
netcdf-fortran            4.6.0           nompi_heb5813c_103    conda-forge
netcdf4                   1.6.3           nompi_py310h0feb132_100    conda-forge
nodeenv                   1.7.0              pyhd8ed1ab_0    conda-forge
nomkl                     1.0                  h5ca1d4c_0    conda-forge
nspr                      4.35                 h27087fc_0    conda-forge
nss                       3.89                 he45b914_0    conda-forge
numba                     0.56.4          py310h0e39c9b_1    conda-forge
numbagg                   0.2.2              pyhd8ed1ab_1    conda-forge
numcodecs                 0.11.0          py310heca2aa9_1    conda-forge
numexpr                   2.8.4           py310h690d005_100    conda-forge
numpy                     1.23.5          py310h53a5b5f_0    conda-forge
numpy_groupies            0.9.20             pyhd8ed1ab_0    conda-forge
openblas                  0.3.21          pthreads_h320a7e8_3    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.1.0                h0b41bf4_0    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3           py310h9b08913_1    conda-forge
partd                     1.3.0              pyhd8ed1ab_0    conda-forge
patsy                     0.5.3              pyhd8ed1ab_0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pillow                    9.4.0           py310h023d228_1    conda-forge
pint                      0.20.1             pyhd8ed1ab_0    conda-forge
pip                       23.0.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_0    conda-forge
platformdirs              3.2.0              pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0              pyhd8ed1ab_5    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
poppler                   23.03.0              h091648b_0    conda-forge
poppler-data              0.4.12               hd8ed1ab_0    conda-forge
postgresql                15.2                 h3248436_0    conda-forge
pre-commit                3.2.2              pyha770c72_0    conda-forge
proj                      9.1.1                h8ffa02c_2    conda-forge
pseudonetcdf              3.2.2              pyhd8ed1ab_0    conda-forge
psutil                    5.9.4           py310h5764c6d_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pydap                     3.4.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 23.1.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyproj                    3.5.0           py310h15e2413_0    conda-forge
pyrsistent                0.19.3          py310h1fa729e_0    conda-forge
pyshp                     2.3.1              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pytest                    7.3.0              pyhd8ed1ab_0    conda-forge
pytest-cov                4.0.0              pyhd8ed1ab_0    conda-forge
pytest-env                0.8.1              pyhd8ed1ab_0    conda-forge
pytest-xdist              3.2.1              pyhd8ed1ab_0    conda-forge
python                    3.10.10         he550d4f_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-eccodes            1.5.1           py310h0a54255_0    conda-forge
python-fastjsonschema     2.16.3             pyhd8ed1ab_0    conda-forge
python-xxhash             3.2.0           py310h1fa729e_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0             py310h5764c6d_5    conda-forge
rasterio                  1.3.6           py310h3e853a9_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.28.2             pyhd8ed1ab_1    conda-forge
s3transfer                0.6.0              pyhd8ed1ab_0    conda-forge
scipy                     1.10.1          py310h8deb116_0    conda-forge
seaborn                   0.12.2               hd8ed1ab_0    conda-forge
seaborn-base              0.12.2             pyhd8ed1ab_0    conda-forge
setuptools                67.6.1             pyhd8ed1ab_0    conda-forge
shapely                   2.0.1           py310h8b84c32_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
snuggs                    1.4.7                      py_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
sparse                    0.14.0             pyhd8ed1ab_0    conda-forge
sqlite                    3.40.0               h4ff8645_0    conda-forge
statsmodels               0.13.5          py310hde88566_2    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
tiledb                    2.13.2               hd532e3d_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
tornado                   6.2             py310h5764c6d_1    conda-forge
traitlets                 5.9.0              pyhd8ed1ab_0    conda-forge
typing-extensions         4.5.0                hd8ed1ab_0    conda-forge
typing_extensions         4.5.0              pyha770c72_0    conda-forge
tzcode                    2023c                h0b41bf4_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
udunits2                  2.2.28               hc3e0081_0    conda-forge
ukkonen                   1.0.1           py310hbf28c38_3    conda-forge
unicodedata2              15.0.0          py310h5764c6d_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
virtualenv                20.21.0            pyhd8ed1ab_0    conda-forge
webob                     1.8.7              pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.15.0          py310h1fa729e_0    conda-forge
xarray                    2023.3.0           pyhd8ed1ab_0    conda-forge
xerces-c                  3.2.4                h55805fa_1    conda-forge
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.8.4                h0b41bf4_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxfixes            5.0.3             h7f98852_1004    conda-forge
xorg-libxi                1.7.10               h7f98852_0    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xxhash                    0.8.1                h0b41bf4_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yarl                      1.8.2           py310h5764c6d_0    conda-forge
zarr                      2.14.2             pyhd8ed1ab_0    conda-forge
zict                      2.2.0              pyhd8ed1ab_0    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

@trexfeathers
Copy link

Hi. I updated the branch and created a fresh python environment with the idea of writing another, final test for this.
However before doing it I run the test suite, and got some bad HDF5 errors in test_backends.py::test_open_mfdataset_manyfiles[netcdf4-20-True-None-5]

#7549

@markelg
Copy link
Contributor

markelg commented Apr 17, 2023

Thanks. It looks like the errors are related to this bug Unidata/netcdf-c#2674 The fix has been merged so I hope they include it in the next netcdf-c release. For the moment I prefer not to merge this as netcdf 4.9.2 and dask do not seem to play well together.

@zklaus
Copy link

zklaus commented Apr 18, 2023

@markelg, regarding the blosc filters, this may have been a combination of conda-forge building and an upstream netcdf-c quirk. In the conda-forge build, we did not install the necessary hdf5 plugins. I fixed that in conda-forge/libnetcdf-feedstock#172. There we also added blosc as a dependency, which was not present before. The netcdf-c quirk is that, if a compression library is not present at build time, support for it will not be compiled and the call to setting the corresponding compression will silently succeed, but not do anything. If the library is present at build time, but the corresponding plugin is missing at runtime, the attempt to set compression will error out.
With the most recent libnetcdf-4.9.2 builds one should get working plugins, including working blosc compression.

Regarding the HDF5-diag errors, the upstream bug has already been mentioned. I am suggesting to back-port it in conda-forge/libnetcdf-feedstock#175 for the conda-forge build, so hopefully that will work in libnetcdf-4.9.2 soon, too.

@zklaus
Copy link

zklaus commented Apr 19, 2023

For the moment I prefer not to merge this as netcdf 4.9.2 and dask do not seem to play well together.

@markelg, could you elaborate? Is this about the two issues discussed here, or are there more problems? I am asking because I am wondering whether to back-port the patches to 4.9.1?

@markelg
Copy link
Contributor

markelg commented Apr 21, 2023

I think it is about these two issues only, so backporting the fixes it should work.

@zklaus
Copy link

zklaus commented Apr 21, 2023

Do you need them in 4.9.1 then, or is updating to 4.9.2 an option?

@markelg
Copy link
Contributor

markelg commented Apr 24, 2023

Good question. Right now ci/requirements/environment.yml is resolving libnetcdf 4.9.1, so fixing 4.9.2 would not work. I am not sure why or how to change this, as few package versions are pinned.

@markelg
Copy link
Contributor

markelg commented Jun 20, 2023

Now the environment resolves libnetcdf 4.9.2 and the blosc filter seems to be working. Although it does not work with some chunk shapes. I got weird errors when using a chunksize (1, 10), it works well with (5, 10).

Blosc_FIlter Error: blosc_filter: Buffer is uncompressible.
Blosc_FIlter Error: blosc_filter: Buffer is uncompressible.

It does not appear in google.

With this I can add some tests to the PR.

@zklaus
Copy link

zklaus commented Jun 20, 2023

I think that is a fundamental blosc limitation, though it should possibly be handled more gracefully in libnetcdf. Probably @FrancescAlted knows better?

@markelg
Copy link
Contributor

markelg commented Jun 22, 2023

I found errors too with blosc_shuffle=0 and 2, only 1 seems to work. See the test added to #7551

@FrancescAlted
Copy link

Hi. I lack some context here, but when buffer is uncompressible, Blosc (both 1 and 2) returns a negative value (https://www.blosc.org/c-blosc2/reference/blosc1.html#c.blosc1_compress). You should check for that and take actions. On the other hand, if you provide the output buffer with room enough (i.e. input_size + BLOSC(2)_MAX_OVERHEAD), Blosc guarantees compression to always succeed.

@markelg
Copy link
Contributor

markelg commented Jun 22, 2023

Thank you Francesc. However I don't think we can do that low level checks here, we would need to move this issues downstream, to netCDF4-python or even to the HDF5 plugin.

@FrancescAlted
Copy link

For what is worth, the https://pypi.org/project/hdf5plugin/ allow to use both Blosc and Blosc2 as a plugin for HDF5.

@zklaus
Copy link

zklaus commented Jun 22, 2023

Since conda-forge/libnetcdf-feedstock#172, conda-forges libnetcdf comes with blosc plugin. But maybe that is an outdated version? The source code is at https://github.com/Unidata/netcdf-c/blob/main/plugins/H5Zblosc.c.

mraspaud added a commit to mraspaud/satpy that referenced this issue Jun 27, 2023
This is due to pydata/xarray#7388 not being solved yet.
@mraspaud
Copy link
Contributor

We are eagerly waiting for this issue to be solved :) Is there anything we can do to help?

@rabernat
Copy link
Contributor Author

@mraspaud thanks for offering to help!

I believe the action item is to review #7551 and try to understand whether the upstream libnetcdf-related issues have been resolved to the point that that PR can be finished.

@kmuehlbauer
Copy link
Contributor

Please check back on #7551. We've given it a push and at least it seems to work well on linux/macos. Doesn't work on windows, though.

@rabernat
Copy link
Contributor Author

Almost got this fixed within one year! 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants