-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nan
values appearing when saving and loading from netCDF
due to encoding
#7691
Comments
Thanks for all the details, @euronion. From what I can tell, everything is OK with the original file. It's using packed data: https://docs.unidata.ucar.edu/nug/current/best_practices.html#bp_Packed-Data-Values. The only thing what might be a bit off is why they didn't choose As both The reason why this isn't done is because the Lines 379 to 384 in 020b4c0
Lines 67 to 70 in 020b4c0
The xarray/xarray/coding/variables.py Lines 235 to 251 in e79eaf5
As this doesn't surface that often it might just happen here by accident. If the Update: corrected to |
So for NetCDF the default fillvalue for NC_SHORT ( |
MCVE: fname = "test-7691.nc"
import netCDF4 as nc
with nc.Dataset(fname, "w") as ds0:
ds0.createDimension("t", 5)
ds0.createVariable("x", "int16", ("t",), fill_value=-32767)
v = ds0.variables["x"]
v.set_auto_maskandscale(False)
v.add_offset = 278.297319296597
v.scale_factor = 1.16753614203674e-05
v[:] = np.array([-32768, -32767, -32766, 32767, 0])
with nc.Dataset(fname) as ds1:
x1 = ds1["x"][:]
print("netCDF4-python:", x1.dtype, x1)
with xr.open_dataset(fname) as ds2:
x2 = ds2["x"].values
ds2.to_netcdf("test-7691-01.nc")
print("xarray first read:", x2.dtype, x2)
with xr.open_dataset("test-7691-01.nc") as ds3:
x3 = ds3["x"].values
print("xarray roundtrip:", x3.dtype, x3) netCDF4-python: float64 [277.9147410535744 -- 277.9147644042972 278.67988586425815
278.297319296597]
xarray first read: float32 [277.91476 nan 277.91476 278.6799 278.29733]
xarray roundtrip: float32 [ nan nan nan 278.6799 278.29733] I've confirmed that correctly promoting to |
It would be good to merge some version of #6812. This seems to be pretty common Review comments and PR remixes welcome! |
Great, thanks for testing. |
Hi @kmuehlbauer , thanks for looking into the issue again! |
Thanks @euronion, no worries, just have a look when you can spare the time. |
see reported issue pydata/xarray#7691 and pr pydata/xarray#8713 which was included into xarray v2024.03.0
* fix: Skip previous encoding workaround for fixed xarray versions see reported issue pydata/xarray#7691 and pr pydata/xarray#8713 which was included into xarray v2024.03.0 * Replace workaround by a new xarray lower bound --------- Co-authored-by: Jonas Hoersch <[email protected]>
What happened?
When writing to and reading my dataset from
netCDF
usingds.to_netcdf()
andxr.open_dataset(...)
,xarray
createsnan
values where previously number values (float32
) where.The issue seems related to the
encoding
used for the original dataset, which causes the data to be stored asshort
. During loading, the stored values then collide with_FillValue
leading to the numbers being interpreted asnan
.What did you expect to happen?
Values after saving & loading should be the same as before saving.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
I'm not sure whether this should be considered a bug or just a combination of conflicting features. My current workaround is resetting the
encoding
and lettingxarray
decide to store asfloat
instead ofshort
(cf. #7686).Environment
INSTALLED VERSIONS
commit: None
python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.90.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.11.0
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.10.0
netCDF4: 1.6.2
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.13.6
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.3
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.02.1
distributed: 2022.2.1
matplotlib: 3.6.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: 7.2.0
IPython: 8.11.0
sphinx: None
The text was updated successfully, but these errors were encountered: