Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an example of ERA5 and GRIB data & visualization to the gallery #3199

Merged
merged 4 commits into from
Sep 23, 2019

Conversation

zbruick
Copy link

@zbruick zbruick commented Aug 9, 2019

This was from the SciPy sprint, but failed to get submitted as a PR yet. This was developed by myself and @StephanSiemen to demonstrate visualizing ERA5 GRIB data. The data was already added in pydata/xarray-data#14.

@codecov
Copy link

codecov bot commented Aug 9, 2019

Codecov Report

Merging #3199 into scipy19-docs will decrease coverage by 0.35%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           scipy19-docs    #3199      +/-   ##
================================================
- Coverage         95.76%   95.41%   -0.36%     
================================================
  Files                64       64              
  Lines             13188    13188              
================================================
- Hits              12630    12583      -47     
- Misses              558      605      +47

@rabernat
Copy link
Contributor

Hi @zbruick - thanks for this and sorry for not following up sooner.

Do you feel this example is ready for review?

@rabernat
Copy link
Contributor

In its current form, I'd say that your example is missing some text (markdown cells) which provide a narrative.

See this example - http://xarray.pydata.org/en/scipy19-docs/examples/ROMS_ocean_model.html - for what I am talking about.

@rabernat rabernat mentioned this pull request Sep 10, 2019
@zbruick
Copy link
Author

zbruick commented Sep 10, 2019

Happy to add some markdown to this - that was an oversight. I'll work on that and push it up shortly.

@rabernat
Copy link
Contributor

It looks like the doc build is failing with this error

Sphinx parallel build error:
nbsphinx.NotebookError: CellExecutionError in examples/ERA5-GRIB-example.ipynb:
------------------
ds = xr.load_dataset('data/era5-2mt-2019-03-uk.grib', engine='cfgrib')
------------------

---------------------------------------------------------------------------‌
FileNotFoundError‌                         Traceback (most recent call last)‌
<ipython-input-2-90ecf53ec89c>‌ in ‌<module>‌
----> 1‌ ‌ds‌ ‌=‌ ‌xr‌.‌load_dataset‌(‌'data/era5-2mt-2019-03-uk.grib'‌,‌ ‌engine‌=‌'cfgrib'‌)‌

~/work/1/s/xarray/backends/api.py‌ in ‌load_dataset‌(filename_or_obj, **kwargs)‌
    256‌         ‌raise‌ ‌TypeError‌(‌"cache has no effect in this context"‌)‌
    257‌ ‌
--> 258‌     ‌with‌ ‌open_dataset‌(‌filename_or_obj‌,‌ ‌**‌kwargs‌)‌ ‌as‌ ‌ds‌:‌
    259‌         ‌return‌ ‌ds‌.‌load‌(‌)‌
    260‌ ‌

~/work/1/s/xarray/backends/api.py‌ in ‌open_dataset‌(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)‌
    516‌         ‌elif‌ ‌engine‌ ‌==‌ ‌"cfgrib"‌:‌
    517‌             store = backends.CfGribDataStore(‌
--> 518‌                 ‌filename_or_obj‌,‌ ‌lock‌=‌lock‌,‌ ‌**‌backend_kwargs‌
    519‌             )‌
    520‌ ‌

~/work/1/s/xarray/backends/cfgrib_.py‌ in ‌__init__‌(self, filename, lock, **backend_kwargs)‌
     41‌             ‌lock‌ ‌=‌ ‌ECCODES_LOCK‌
     42‌         ‌self‌.‌lock‌ ‌=‌ ‌ensure_lock‌(‌lock‌)‌
---> 43‌         ‌self‌.‌ds‌ ‌=‌ ‌cfgrib‌.‌open_file‌(‌filename‌,‌ ‌**‌backend_kwargs‌)‌
     44‌ ‌
     45‌     ‌def‌ ‌open_store_variable‌(‌self‌,‌ ‌name‌,‌ ‌var‌)‌:‌

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cfgrib/dataset.py‌ in ‌open_file‌(path, grib_errors, indexpath, filter_by_keys, **kwargs)‌
    601‌ ):‌
    602‌     ‌"""Open a GRIB file as a ``cfgrib.Dataset``."""‌
--> 603‌     ‌index‌ ‌=‌ ‌open_fileindex‌(‌path‌,‌ ‌grib_errors‌,‌ ‌indexpath‌,‌ ‌filter_by_keys‌)‌
    604‌     ‌return‌ ‌Dataset‌(‌*‌build_dataset_components‌(‌index‌,‌ ‌**‌kwargs‌)‌)‌

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cfgrib/dataset.py‌ in ‌open_fileindex‌(path, grib_errors, indexpath, filter_by_keys)‌
    594‌     ‌filter_by_keys‌ ‌=‌ ‌dict‌(‌filter_by_keys‌)‌
    595‌     ‌stream‌ ‌=‌ ‌messages‌.‌FileStream‌(‌path‌,‌ ‌message_class‌=‌cfmessage‌.‌CfMessage‌,‌ ‌errors‌=‌grib_errors‌)‌
--> 596‌     ‌return‌ ‌stream‌.‌index‌(‌ALL_KEYS‌,‌ ‌indexpath‌=‌indexpath‌)‌.‌subindex‌(‌filter_by_keys‌)‌
    597‌ ‌
    598‌ ‌

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cfgrib/messages.py‌ in ‌index‌(self, index_keys, indexpath)‌
    235‌     ‌def‌ ‌index‌(‌self‌,‌ ‌index_keys‌,‌ ‌indexpath‌=‌'{path}.{short_hash}.idx'‌)‌:‌
    236‌         ‌# type: (T.List[str], str) -> FileIndex‌
--> 237‌         ‌return‌ ‌FileIndex‌.‌from_indexpath_or_filestream‌(‌self‌,‌ ‌index_keys‌,‌ ‌indexpath‌)‌
    238‌ ‌
    239‌ ‌

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cfgrib/messages.py‌ in ‌from_indexpath_or_filestream‌(cls, filestream, index_keys, indexpath, log)‌
    331‌             ‌log‌.‌exception‌(‌"Can't read index file %r"‌,‌ ‌indexpath‌)‌
    332‌ ‌
--> 333‌         ‌return‌ ‌cls‌.‌from_filestream‌(‌filestream‌,‌ ‌index_keys‌)‌
    334‌ ‌
    335‌     ‌def‌ ‌__iter__‌(‌self‌)‌:‌

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cfgrib/messages.py‌ in ‌from_filestream‌(cls, filestream, index_keys)‌
    262‌         ‌offsets‌ ‌=‌ ‌collections‌.‌OrderedDict‌(‌)‌
    263‌         ‌count_offsets‌ ‌=‌ ‌{‌}‌  ‌# type: T.Dict[int, int]‌
--> 264‌         ‌for‌ ‌message‌ ‌in‌ ‌filestream‌:‌
    265‌             ‌header_values‌ ‌=‌ ‌[‌]‌
    266‌             ‌for‌ ‌key‌ ‌in‌ ‌index_keys‌:‌

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cfgrib/messages.py‌ in ‌__iter__‌(self)‌
    208‌     ‌def‌ ‌__iter__‌(‌self‌)‌:‌
    209‌         ‌# type: () -> T.Generator[Message, None, None]‌
--> 210‌         ‌with‌ ‌open‌(‌self‌.‌path‌,‌ ‌'rb'‌)‌ ‌as‌ ‌file‌:‌
    211‌             ‌valid_message_found‌ ‌=‌ ‌False‌
    212‌             ‌while‌ ‌True‌:‌

FileNotFoundError‌: [Errno 2] No such file or directory: '/home/vsts/work/1/s/doc/examples/data/era5-2mt-2019-03-uk.grib'‌
FileNotFoundError: [Errno 2] No such file or directory: '/home/vsts/work/1/s/doc/examples/data/era5-2mt-2019-03-uk.grib'

It appears that you are loading the data from a local file, rather than from the tutorial datasets api. This won't work, because the local file is not available in the build environment. That's why we added it to the tutorial repo in pydata/xarray-data#14.

Can you try something like

ds = xr.tutorial.load_dataset('era5-2mt-2019-03-uk.grib', engine='cfgrib')

@rabernat
Copy link
Contributor

I just fixed a file naming issue in pydata/xarray-data#18 and can confirm that my code above works for me.

@rabernat
Copy link
Contributor

rabernat commented Sep 11, 2019

Now we are getting MD5 checksum does not match, try downloading dataset again. Doesn't make sense to me.

https://dev.azure.com/xarray/xarray/_build/results?buildId=780

@zbruick
Copy link
Author

zbruick commented Sep 11, 2019

Thanks for pushing that commit and updating the data import call. I forgot I had to switch that from my local file. Let me know if this issue is solvable on my side of things, but that seems like that's an issue with the data file?

@rabernat
Copy link
Contributor

rabernat commented Sep 17, 2019

Can someone with the proper credentials (e.g. @shoyer) restart the azure build? I can't reproduce the md5 error locally.

@rabernat
Copy link
Contributor

I am really stumped here.

When I run

xr.tutorial.load_dataset('era5-2mt-2019-03-uk.grib', engine='cfgrib')

on the scipy19-docs branch, it works fine.

@keewis
Copy link
Collaborator

keewis commented Sep 17, 2019

I'm not sure if this helps you, but I noticed two issues in xr.tutorial.open_dataset: first, it requires the file extension to be ".nc" (resulting in a 404 when fetching the file. The version from the scipy19-docs branch does not have that issue), and second, there is a newline in the md5 file. Modifying remotemd5 = f.read() to remotemd5 = f.read().strip() should do the trick.

@dcherian
Copy link
Contributor

@zbruick Can you merge in scipy19-docs? That might fix it.

@zbruick
Copy link
Author

zbruick commented Sep 23, 2019

Rebased on scipy19-docs. Hope that helps!

@dcherian
Copy link
Contributor

Oops I think you lost ryan's commit with xr.tutorial.load_dataset

@zbruick
Copy link
Author

zbruick commented Sep 23, 2019

Ah, oops. Updated to use that method now.

@rabernat
Copy link
Contributor

Still failing. Instead of

ds = xr.tutorial.load_dataset('data/era5-2mt-2019-03-uk.grib', engine='cfgrib')

it should be

ds = xr.tutorial.load_dataset('era5-2mt-2019-03-uk.grib', engine='cfgrib')

@keewis
Copy link
Collaborator

keewis commented Sep 23, 2019

unless something else is wrong, either modifying the md5 file (remove the newline) or adding strip() to f.read() in tutorial.open_dataset should fix the md5 check.

@dcherian
Copy link
Contributor

I removed the newline upstream.

@keewis
Copy link
Collaborator

keewis commented Sep 23, 2019

that seems to have worked, the failures should be unrelated.

@dcherian
Copy link
Contributor

🤦‍♂️ finally!

@dcherian
Copy link
Contributor

Thanks @keewis

@rabernat
Copy link
Contributor

We got it! Thanks everyone!

@rabernat rabernat merged commit f82d112 into pydata:scipy19-docs Sep 23, 2019
@rabernat
Copy link
Contributor

Unfortunately the RTD build failed!

https://readthedocs.org/projects/xray/builds/9700942/

😭😭😭

This is really trying my patience!

@dcherian
Copy link
Contributor

image

@keewis
Copy link
Collaborator

keewis commented Sep 24, 2019

(roughly) following the instructions in the build log, I can reproduce it raising a 404:

$ conda env create --quiet --name scipy19-docs --file doc/environment.yml
$ conda install --yes --quiet --name scipy19-docs mock pillow sphinx sphinx_rtd_theme
(scipy19-docs)$ python -m pip install -U recommonmark readthedocs-sphinx-ext
(scipy19-docs)$ python setup.py install --force
(scipy19-docs)$ cd doc; make html

I didn't check yet, but maybe the second conda install should not happen, as all those get installed on env creation?

@shoyer
Copy link
Member

shoyer commented Sep 24, 2019 via email

@keewis
Copy link
Collaborator

keewis commented Sep 24, 2019

err, well, this is somewhat confusing: I'm not sure what pulls it, but it seems something depends on xarray (installed in the creation of the environment). The version that is conda-installed is 0.13.0, so as master does not have the new version of open_dataset that does not enforce ".nc" files, the 404 happens. In short, the version from setup.py install gets shadowed, I guess because the version is 0.12.3? In that case, merging master into scipy19-docs might fix this?

@dcherian
Copy link
Contributor

Thanks @keewis . I just merged master and pushed. hopefully this works

@dcherian
Copy link
Contributor

still fails with 404: https://readthedocs.org/projects/xray/builds/9704981/

@keewis
Copy link
Collaborator

keewis commented Sep 24, 2019

The problem is still the same: while the output of the environment printed by sphinx lists the new version as
xarray 0.13.0+22.gaeb15b56 pypi_0 pypi
which in theory should be used, but building the docs prints this:
xarray: 0.13.0, /home/docs/checkouts/readthedocs.org/user_builds/xray/conda/scipy19-docs/lib/python3.7/site-packages/xarray/__init__.py
which does not have the version of open_dataset that allows non-nc files. Honestly, I'm no expert on imports and version management, so I don't know what the best fix for this is.

Something that comes to mind is waiting for a new release with the new version of open_dataset, but that is not an immediate fix (and only a workaround at best).

@keewis
Copy link
Collaborator

keewis commented Sep 24, 2019

After some experimentation, it seems this is only the case if setup.py install is used, not with pip install: for me, at least, in the commands from #3199 (comment), replacing

$ python setup.py install --force

with

$ python -m pip install .

makes this work, even without the merge commit.

The docs on the settings mention pip_install, maybe use that?

@dcherian
Copy link
Contributor

👍

Let's see if it works: https://readthedocs.org/projects/xray/builds/9705929/

@dcherian
Copy link
Contributor

OK that almost worked. Now it looks like we need to bump pandas? or dask?

AttributeError                            Traceback (most recent call last)
<ipython-input-8-708052ae49bf> in <module>
----> 1 df = ds.to_dask_dataframe()

~/checkouts/readthedocs.org/user_builds/xray/conda/scipy19-docs/lib/python3.7/site-packages/xarray/core/dataset.py in to_dask_dataframe(self, dim_order, set_index)
   4278             series_list.append(series)
   4279 
-> 4280         df = dd.concat(series_list, axis=1)
   4281 
   4282         if set_index:

~/checkouts/readthedocs.org/user_builds/xray/conda/scipy19-docs/lib/python3.7/site-packages/dask/dataframe/multi.py in concat(dfs, axis, join, interleave_partitions)
    598     axis = DataFrame._validate_axis(axis)
    599     dasks = [df for df in dfs if isinstance(df, _Frame)]
--> 600     dfs = _maybe_from_pandas(dfs)
    601 
    602     if axis == 1:

~/checkouts/readthedocs.org/user_builds/xray/conda/scipy19-docs/lib/python3.7/site-packages/dask/dataframe/core.py in _maybe_from_pandas(dfs)
   3461     dfs = [from_pandas(df, 1)
   3462            if (is_series_like(df) or is_dataframe_like(df)) and not is_dask_collection(df)
-> 3463            else df for df in dfs]
   3464     return dfs
   3465 

~/checkouts/readthedocs.org/user_builds/xray/conda/scipy19-docs/lib/python3.7/site-packages/dask/dataframe/core.py in <listcomp>(.0)
   3461     dfs = [from_pandas(df, 1)
   3462            if (is_series_like(df) or is_dataframe_like(df)) and not is_dask_collection(df)
-> 3463            else df for df in dfs]
   3464     return dfs
   3465 

~/checkouts/readthedocs.org/user_builds/xray/conda/scipy19-docs/lib/python3.7/site-packages/dask/dataframe/utils.py in is_series_like(s)
    503 def is_series_like(s):
    504     """ Looks like a Pandas Series """
--> 505     return set(dir(s)) > {'name', 'dtype', 'groupby', 'head'}
    506 
    507 

~/checkouts/readthedocs.org/user_builds/xray/conda/scipy19-docs/lib/python3.7/site-packages/dask/dataframe/core.py in __dir__(self)
   2554         o = set(dir(type(self)))
   2555         o.update(self.__dict__)
-> 2556         o.update(c for c in self.columns if
   2557                  (isinstance(c, pd.compat.string_types) and
   2558                   pd.compat.isidentifier(c)))

~/checkouts/readthedocs.org/user_builds/xray/conda/scipy19-docs/lib/python3.7/site-packages/dask/dataframe/core.py in <genexpr>(.0)
   2555         o.update(self.__dict__)
   2556         o.update(c for c in self.columns if
-> 2557                  (isinstance(c, pd.compat.string_types) and
   2558                   pd.compat.isidentifier(c)))
   2559         return list(o)

AttributeError: module 'pandas.compat' has no attribute 'string_types'
<<<-------------------------------------------------------------------------

@keewis
Copy link
Collaborator

keewis commented Sep 24, 2019

it seems all of these problems boil down to that single command. What gets executed is (essentially)

$ python -m pip install --upgrade --upgrade-strategy eager .

which upgrades pandas to 0.25.1, but leaves dask as is because it is an optional dependency. This, however, defeats the purpose of version pinning in the conda file (or a requirements file).

I can't find an option to tell readthedocs to install without the eager upgrade (normal upgrade is fine). Maybe ask support? The addition of the upgrade strategy happened in readthedocs/readthedocs.org#5635

@shoyer
Copy link
Member

shoyer commented Sep 24, 2019 via email

@rabernat
Copy link
Contributor

Is someone planning to open this RTD issue? Or is there another workaround? I would love to get this scipy19-docs branch merged to master, but this seems to be standing in the way.

@keewis
Copy link
Collaborator

keewis commented Sep 25, 2019

I'm currently not planning to because I will probably be offline for the next few days

Edit: the only workaround I can imagine is building the docs without installing (which is mentioned in the docs), but I neither know why the current configuration is the way it is or how building the docs without installing works with RTD

@keewis
Copy link
Collaborator

keewis commented Oct 4, 2019

looking at this once again, there are a lot of projects that don't install (eg aiohttp (build), but I'm not sure how this is set up or which parts of their configuration comes from somewhere other than their .readthedocs.yml).

Also, it seems we are using the old version of readthedocs.yml.

And finally, it seems strange to me that the scipy19-docs fails but master doesn't, which means that master does something different?

As I don't have access to readthedocs I can't really experiment on these suggestions (making this informed guessing at best), so I would like to leave the tracking down and fixing to someone else.

@crusaderky
Copy link
Contributor

Hi, check out the work I did on the RTD requirements file here, it may help: #3358

@keewis keewis mentioned this pull request Nov 20, 2019
1 task
keewis pushed a commit to keewis/xarray that referenced this pull request Nov 21, 2019
…ydata#3199)

* Adds an example of ERA5 and GRIB data to the gallery

* Add markdown narrative cells to GRIB example

* Update load method to use xr.tutorial

* Fix load method
dcherian pushed a commit that referenced this pull request Nov 22, 2019
* Switch doc examples to use nbsphinx (#3105)

* switching out examples to use nbsphinx

* added jupyter_client to doc env

* added ipykernel to doc env

* Replace sphinx_gallery with notebook (#3106)

* switching out examples to use nbsphinx

* added jupyter_client to doc env

* moved gallery to notebook

* Allow other tutorial filename extensions (#3121)

* switching out examples to use nbsphinx

* added jupyter_client to doc env

* allow non netcdf tutorial files

* Added ROMS ocean model example notebook (#3116)

* change name of test env to xarray-tests (#3110)

* ROMS_ocean_model example added

* Allow other tutorial filename extensions (#3121)

* switching out examples to use nbsphinx

* added jupyter_client to doc env

* allow non netcdf tutorial files

* Changed load to xr.tutorial.open_dataset(), and
added some extra documentation.

* change name of test env to xarray-tests (#3110)

* ROMS_ocean_model example added

* Changed load to xr.tutorial.open_dataset(), and
added some extra documentation.

* fixed colormap issues leftover from cmocean import

* Added intro paragraph to ROMS example notebook, removed comments, and added citation in whats-new.

* Add an example of ERA5 and GRIB data & visualization to the gallery (#3199)

* Adds an example of ERA5 and GRIB data to the gallery

* Add markdown narrative cells to GRIB example

* Update load method to use xr.tutorial

* Fix load method

* require nbsphinx for the documentation builds

* add more nbsphinx dependencies

* install cfgrib using pip

* add the eccodes library to the dependencies

* remove the dependency on sphinx-gallery

* add the ERA5 GRIB example to the list

* update the documentation links

Missing: section links in visualization_gallery.ipynb don't work yet,
also the one in io.rst (it has a unicode char).

* Fix leap year condition in monthly means example (#3464)

* Typo correction in docs (#3387)

* Update terminology.rst (#3455)

Fixed broken link

* Error in leap year?

I've tried this script; however, it adds +1 to all months of the leap years. It sounds like an error, or I am wrong? So I wrote the condition "and month == 2" line 86 so that only the month of February gets +1.

* Fix leap year (#3464)

* Update doc/whats-new.rst

Co-Authored-By: Deepak Cherian <[email protected]>

* fix the reference to the rasterio geocoordinates docs

* update whats-new.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants