Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If "chunks=None" is set in open_mfdataset, it is changed to "chunks={}" before being passed to "_dataset_from_backend_dataset" #7792

Open
1 of 4 tasks
timcera opened this issue Apr 27, 2023 · 2 comments · May be fixed by #5704

Comments

@timcera
Copy link

timcera commented Apr 27, 2023

What happened?

Using the grib2io engine, but have to use on a system that currently doesn't allow dask to be installed. Looking through the code I think that setting "chunks=None" would work to not use dask, but on

open_kwargs = dict(engine=engine, chunks=chunks or {}, **kwargs)
"chunks=None" is converted to "chunks={}".

This means that at this test

if chunks is None:
for "chunks is None" will never be true and the dask code path will always run.

The example below uses the rasterio engine because I could open publicly available files from S3. The rasterio engine gives the same error as the grib2io engine.

What did you expect to happen?

Expected open_mfdataset to work without dask installed.

Minimal Complete Verifiable Example

# Have to create an environment that doesn't include dask.  For example:
#     conda create -n xarrayenv -c conda-forge xarray rioxarray
#     conda activate xarrayenv

import xarray as xr
import os

os.environ["AWS_NO_SIGN_REQUEST"] = "YES"

ds = xr.open_mfdataset(
    [
        "/vsis3/noaa-nbm-grib2-pds/blend.20230401/02/core/blend.t02z.core.f003.co.grib2",
        "/vsis3/noaa-nbm-grib2-pds/blend.20230401/02/core/blend.t02z.core.f004.co.grib2",
    ],
    engine="rasterio",
    chunks=None,
)

# Traceback (most recent call last):                                                                                                                                         
#   File "/home/tim/test.py", line 6, in <module>                                                                                                                            
#     ds = xr.open_mfdataset(                                                                                                                                                
#          ^^^^^^^^^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 982, in open_mfdataset
#     datasets = [open_(p, **open_kwargs) for p in paths]
#                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 982, in <listcomp>
#     datasets = [open_(p, **open_kwargs) for p in paths]
#                 ^^^^^^^^^^^^^^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 531, in open_dataset
#     ds = _dataset_from_backend_dataset(
#          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 342, in _dataset_from_backend_dataset
#     ds = _chunk_ds(
#          ^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 302, in _chunk_ds
#     from dask.base import tokenize
# ModuleNotFoundError: No module named 'dask'

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

/home/tim/anaconda3/envs/xarrayenv/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-70-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2023.4.2
pandas: 2.0.1
numpy: 1.24.3
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

@timcera timcera added bug needs triage Issue that has not been reviewed by xarray team member labels Apr 27, 2023
@welcome
Copy link

welcome bot commented Apr 27, 2023

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@Illviljan Illviljan linked a pull request Apr 27, 2023 that will close this issue
6 tasks
@Illviljan
Copy link
Contributor

Illviljan commented Apr 27, 2023

It's because opening several files requires concatenating the files. xarray does not have any machinery to do that lazily without dask, so all your files will be loaded to memory. Maybe that's ok for you? If the files are small it should be fine.

See #5704 for more discussion.

@Illviljan Illviljan added topic-backends and removed needs triage Issue that has not been reviewed by xarray team member labels Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants