Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nbytes not available for lazy loaded array and so can't print(ds) #9185

Closed
4 of 5 tasks
TimothyCera-NOAA opened this issue Jun 27, 2024 · 6 comments
Closed
4 of 5 tasks
Labels

Comments

@TimothyCera-NOAA
Copy link
Contributor

What happened?

We use the grib2io backend to read GRIB2 formatted files. Started to have problem printing the summary of the dataset to the screen with the v2024.02.0 release. I suspect the problem is from #8702

Trying to print a dataset will fail trying to find nbytes.

The grib2io backend opens the file lazily, which means you are printing the summary of a MemoryCachedArray which doesn't have nbytes, nor is able to calculate.

Loading the data into memory and then the print(ds1) works fine.

import xarray as xr
filters = {
        "productDefinitionTemplateNumber": 0,
        "typeOfFirstFixedSurface": 1,
        "shortName": "TMP",
        }
ds1 = xr.open_dataset(
        "gfs_20221107/gfs.t00z.pgrb2.1p00.f012_subset",
        engine="grib2io",
        filters=filters,
    )
print(ds1)
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 print(ds1)

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/core/dataset.py:2569, in Dataset.__repr__(self)
   2568 def __repr__(self) -> str:
-> 2569     return formatting.dataset_repr(self)

File ~/anaconda3/envs/default311/lib/python3.11/reprlib.py:21, in recursive_repr.<locals>.decorating_function.<locals>.wrapper(self)
     19 repr_running.add(key)
     20 try:
---> 21     result = user_function(self)
     22 finally:
     23     repr_running.discard(key)

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/core/formatting.py:717, in dataset_repr(ds)
    715 @recursive_repr("<recursive Dataset>")
    716 def dataset_repr(ds):
--> 717     nbytes_str = render_human_readable_nbytes(ds.nbytes)
    718     summary = [f"<xarray.{type(ds).__name__}> Size: {nbytes_str}"]
    720     col_width = _calculate_col_width(ds.variables)

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/core/dataset.py:1544, in Dataset.nbytes(self)
   1536 @property
   1537 def nbytes(self) -> int:
   1538     """
   1539     Total bytes consumed by the data arrays of all variables in this dataset.
   1540 
   1541     If the backend array for any variable does not include ``nbytes``, estimates
   1542     the total bytes for that array based on the ``size`` and ``dtype``.
   1543     """
-> 1544     return sum(v.nbytes for v in self.variables.values())

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/core/dataset.py:1544, in <genexpr>(.0)
   1536 @property
   1537 def nbytes(self) -> int:
   1538     """
   1539     Total bytes consumed by the data arrays of all variables in this dataset.
   1540 
   1541     If the backend array for any variable does not include ``nbytes``, estimates
   1542     the total bytes for that array based on the ``size`` and ``dtype``.
   1543     """
-> 1544     return sum(v.nbytes for v in self.variables.values())

File ~/anaconda3/envs/default311/lib/python3.11/site-packages/xarray/namedarray/core.py:491, in NamedArray.nbytes(self)
    489         itemsize = xp.finfo(self.dtype).bits // 8
    490 else:
--> 491     raise TypeError(
    492         "cannot compute the number of bytes (no array API nor nbytes / itemsize)"
    493     )
    495 return self.size * itemsize

TypeError: cannot compute the number of bytes (no array API nor nbytes / itemsize)

You can force loading the data and then printing works:

print(ds1["TMP"].values[0][0])
253.28014

print(ds1)
<xarray.Dataset> Size: 1MB
Dimensions:                   (y: 181, x: 360)
Coordinates:
    refDate                   datetime64[ns] 8B ...
    leadTime                  timedelta64[ns] 8B ...
    valueOfFirstFixedSurface  float64 8B ...
    latitude                  (y, x) float64 521kB ...
    longitude                 (y, x) float64 521kB ...
    validDate                 datetime64[ns] 8B ...
Dimensions without coordinates: y, x
Data variables:
    TMP                       (y, x) float32 261kB 253.3 253.3 ... 240.2 240.2
Attributes:
    engine:   grib2io

What did you expect to happen?

Want print(ds1) to print the summary of the dataset.

<xarray.Dataset> Size: 1MB
Dimensions:                   (y: 181, x: 360)
Coordinates:
    refDate                   datetime64[ns] 8B ...
    leadTime                  timedelta64[ns] 8B ...
    valueOfFirstFixedSurface  float64 8B ...
    latitude                  (y, x) float64 521kB ...
    longitude                 (y, x) float64 521kB ...
    validDate                 datetime64[ns] 8B ...
Dimensions without coordinates: y, x
Data variables:
    TMP                       (y, x) float32 261kB 253.3 253.3 ... 240.2 240.2
Attributes:
    engine:   grib2io

Minimal Complete Verifiable Example

# You have to download the GRIB2 file from 
"""
https://github.com/NOAA-MDL/grib2io/blob/master/tests/data/gfs_20221107/gfs.t00z.pgrb2.1p00.f012_subset
"""
import xarray as xr
filters = {
            "productDefinitionTemplateNumber": 0,
            "typeOfFirstFixedSurface": 1,
            "shortName": "TMP",
            }
ds1 = xr.open_dataset(
            "gfs_20221107/gfs.t00z.pgrb2.1p00.f012_subset",
            engine="grib2io",
            filters=filters,
        )
print(ds1)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 5.15.0-112-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.6.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.4
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.2.0
pip: 24.0
conda: 24.3.0
pytest: 8.1.1
mypy: None
IPython: 8.22.2
sphinx: 7.3.7

@TimothyCera-NOAA TimothyCera-NOAA added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 27, 2024
Copy link

welcome bot commented Jun 27, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@max-sixty
Copy link
Collaborator

Thanks for the issue. We should definitely have a try/except for the bytes given that can fail...

@max-sixty max-sixty removed the needs triage Issue that has not been reviewed by xarray team member label Jun 27, 2024
@keewis
Copy link
Collaborator

keewis commented Jun 27, 2024

as far as I can tell, the reason for this is that grib2io defines a OnDiskArray, where the dtype is a string. Which is unexpected, but since dtypes are opaque objects in the array API we might have to figure out how to deal with that at some point.

TimothyCera-NOAA added a commit to TimothyCera-NOAA/xarray that referenced this issue Jun 29, 2024
* addresses the nbytes problem described in pydata#9185
@TimothyCera-NOAA
Copy link
Contributor Author

I tried to fix in grib2io replacing "float32" with np.float32 but didn't help, but what did work was enforcing np.dtype in xarray as shown in #9191

@keewis
Copy link
Collaborator

keewis commented Jul 1, 2024

you'd have to replace it with the dtype instance, np.dtype("float32"). It looks like attribute descriptors like itemsize return self instead of the result of __get__ if called from the class object (np.float32) instead of the instance object.

@TimothyCera-NOAA
Copy link
Contributor Author

This issue was supposed to be closed when I closed the #9191, but it wasn't. So closing...

As mentioned in the pull request, comments here and in the pull request were helpful to me tracking down how to fix in grib2io.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants