Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased verbosity for Value Error raised if backend not installed #9294

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

jbusecke
Copy link
Contributor

@jbusecke jbusecke commented Jul 30, 2024

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

I just had an interaction with a researcher over at LEAP and they did not have zarr installed but were confused by the error message they got when trying to access an ARCO zarr dataset in the cloud like this:

import xarray as xr
store = 'gs://gcp-public-data-arco-era5/ar/model-level-1h-0p25deg.zarr-v1'
ds = xr.open_dataset(store, engine='zarr', chunks={})

I reproduced their error by removing zarr from the pangeo/pangeo-notebook:2024.06.02 docker image manually and got:

--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 3
      1 import xarray as xr
      2 store = 'gs://gcp-public-data-arco-era5/ar/model-level-1h-0p25deg.zarr-v1'
----> 3 ds = xr.open_dataset(store, engine='zarr', chunks={})

File /srv/conda/envs/notebook/lib/python3.11/site-packages/xarray/backends/api.py:557, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    554 if from_array_kwargs is None:
    555     from_array_kwargs = {}
--> 557 backend = plugins.get_backend(engine)
    559 decoders = _resolve_decoders_kwargs(
    560     decode_cf,
    561     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    567     decode_coords=decode_coords,
    568 )
    570 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/xarray/backends/plugins.py:205](https://leap.2i2c.cloud/srv/conda/envs/notebook/lib/python3.11/site-packages/xarray/backends/plugins.py#line=204), in get_backend(engine)
    203     engines = list_engines()
    204     if engine not in engines:
--> 205         raise ValueError(
    206             f"unrecognized engine {engine} must be one of: {list(engines)}"
    207         )
    208     backend = engines[engine]
    209 elif isinstance(engine, type) and issubclass(engine, BackendEntrypoint):

ValueError: unrecognized engine zarr must be one of: ['netcdf4', 'h5netcdf', 'scipy', 'argo', 'cfgrib', 'gini', 'kerchunk', 'pydap', 'rasterio', 'store']

I believe that adding some language from the guess_engine function to the ValueError in get_backend could help alleviate this confusion for novice users.

Happy to change the wording, add tests, etc if you think this would be an ok addition.

@max-sixty
Copy link
Collaborator

Very nice, thanks @jbusecke

@jbusecke
Copy link
Contributor Author

Yay! Anything I should/need to add to get this moving?

@dcherian dcherian merged commit f15082c into pydata:main Jul 31, 2024
43 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants