Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cmorizer for the Eppley-VGPM-MODIS dataset is broken #2322

Closed
remi-kazeroni opened this issue Sep 28, 2021 · 13 comments
Closed

Cmorizer for the Eppley-VGPM-MODIS dataset is broken #2322

remi-kazeroni opened this issue Sep 28, 2021 · 13 comments

Comments

@remi-kazeroni
Copy link
Contributor

Describe the bug
While rerunning some cmorizers for #1657, I realized that the Eppley-VGPM-MODIS cmorizer is not working anymore. The file format is no longer supported:

  File "rasterio/_base.pyx", line 263, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: '/mnt/lustre02/work/bd0854/DATA/ESMValTool2/RAWOBS/Tier2/Eppley-VGPM-MODIS/eppley.2002182.hdf' not recognized as a supported file format.

@tomaslovato, @valeriupredoi: would you have any idea why that is the case?

Please attach

  • Command used: cmorize_obs -c config-user.yml -o Eppley-VGPM-MODIS
  • The main_log_debug.txt file, this can also be found in the run directory in the output directory:
    main_log_debug.txt
@valeriupredoi
Copy link
Contributor

valeriupredoi commented Sep 29, 2021

hdf4 not supported by rasterio anymore unless one installs it without compiling from binary files, as it says here - that's dodgy if you asked me; I remember we used rasterio instead of something else that was causing issues (PyNIO?). I'll have to think about this, I don't want us to start adding flags to pip installs when installing individual dependencies - @bouweandela @zklaus would prob agree with me?

@valeriupredoi
Copy link
Contributor

also, I forgot to say, hdf4 is an ancient file format - maybe we can contact the data producers ask them to convert it to hdf5 or netCDF4?

@bouweandela
Copy link
Member

bouweandela commented Oct 1, 2021

This was changed in #1997 and back then it worked according to the authors of that pull request. Could you have another look at this please @tomaslovato?

@tomaslovato
Copy link
Contributor

tomaslovato commented Oct 6, 2021

This was changed in #1997 and back then it worked according to the authors of that pull request. Could you have another look at this please @tomaslovato?

Actually it is not working since rasterio dropped the support as reported by @valeriupredoi in previous comment. When changing the conda environment in #1997, pynio was found to be a problematic library to retain (it is not supported/developed) that's why we moved to rasterio.

also, I forgot to say, hdf4 is an ancient file format - maybe we can contact the data producers ask them to convert it to hdf5 or netCDF4?

I'm not happy as well with this format but I feel like the originator will not be keen to modify the format of all the datasets they are producing. Btw, hdf4 is still used within the remote sensing community.

I'm looking for a possible workaround to load hdf4 data (if there is one) but it will take a while.

@bouweandela
Copy link
Member

bouweandela commented Oct 6, 2021

I see no mention of rasterio dropping support for hdf4 in the link provided by @valeriupredoi. Do you have a source for this information?

@tomaslovato
Copy link
Contributor

In the link posted by @valeriupredoi there is a reference to the issue of rasterio installation rasterio/rasterio#2026 (comment) which explains why the library is producing the not recognized as a supported file format.
To fix the issue rasterio should be installed with the --no-use-wheel flag and this is (I guess) the source of concern of @valeriupredoi (see his comment above).

Sorry @bouweandela I definitely misused dropping support.

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Oct 6, 2021

and I'm getting invoked more often than Jesus H. Christ on an Easter Sunday 😆 Yes, @tomaslovato is correct, we need to install rasterio with a special flag to be able to have it read hdf4, and the stock pip installation won't be enough. What do you think @bouweandela - I don't like having custom installs for our deps, that brings down the reproducibility of the environment and may influence how the env gets solved

@bouweandela
Copy link
Member

Can you try installing it from conda? I suspect that will solve the problem

@valeriupredoi
Copy link
Contributor

hmmm could be, @remi-kazeroni or @tomaslovato you mind plugging it in environment.yml and see if that solves the problem? Not happy to see yet another conda dep but heyho mamba should be able to handle it 🐍

@tomaslovato
Copy link
Contributor

(holy) @valeriupredoi and @bouweandela I made a try with by creating a new environment from scratch installing rasterio with conda and it worked!!
Yet, I would prefer if @remi-kazeroni or some else double check this test!

@zklaus
Copy link

zklaus commented Oct 7, 2021

First, (most of this from the rasterio docs) rasterio is a Python C Extension and has one C library dependency, GDAL. They provide binary wheels on PyPI, which are preferred by newer versions of pip. Quote:

The included GDAL library is fairly minimal, providing only the format drivers that ship with GDAL and are enabled by default. To get access to more formats, you must build from a source distribution (see below).

The command-line switch to pip tells it not to use the binary wheels, which makes pip install from source, compiling everything relying on the active compilers and GDAL devel packages.

Second, why isn't it getting pulled in by xarray from conda-forge? That's because it is one of very many optional dependencies and xarray doesn't depend on any of them in their conda-forge recipe.

Bottom-line: we need to treat rasterio as a binary dependency and as such install it from conda. Which we do in our conda-forge package by depending on it explicitly. For the development environment, it should be added to environment.yml.

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Oct 7, 2021

excellent overview, cheers @zklaus 🍺 I think I'm partly responsible for this when I first added rasterio to our env, well weary of adding extra conda deps due to the env being mule-standard stubborn to solve. Let's put rasterio in nevironment.yml here then 👍

@remi-kazeroni remi-kazeroni added this to the v2.5.0 milestone Dec 16, 2021
@remi-kazeroni
Copy link
Contributor Author

I'm closing this since the environment problem with rasterio was addressed in #2479. Incidentally, it solved this issue. The Eppley-VGPM-MODIS cmorizer works fine when using a fresh new environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants