Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependencies and examples #218

Merged
merged 29 commits into from
May 5, 2022
Merged

Update dependencies and examples #218

merged 29 commits into from
May 5, 2022

Conversation

bryanwweber
Copy link
Contributor

@bryanwweber bryanwweber commented May 3, 2022

This updates all the pinned dependencies to the current version as of today, and fixes any examples that were broken by the updates.

Supersedes #215. Closes #206.

Important change: This also adds all of the dependencies that I noticed that were required to run the examples and removes any !pip install cells in the Notebooks. This makes the binder image larger, so this may not be desirable. Happy to revert that change.

cc/ @jrbourbeau @ian-r-rose

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@bryanwweber bryanwweber mentioned this pull request May 3, 2022
@@ -11,27 +11,13 @@
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    client = Client(n_workers=1, threads_per_worker=4, processes=True, memory_limit='2GB')

This change is needed due to https://github.com/dask/dask/issues/8581


Reply via ReviewNB

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @bryanwweber!

Important change: This also adds all of the dependencies that I noticed that were required to run the examples and removes any !pip install cells in the Notebooks. This makes the binder image larger, so this may not be desirable. Happy to revert that change.

How much longer does it take for the binder instances to spin up when including these dependencies? Generally, I like the idea of including everything that's needed in the conda environment file, but historically this has led to long binder spinup times that we'd like to avoid.

applications/async-web-server.ipynb Show resolved Hide resolved
applications/satellite-imagery-geotiff.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @bryanwweber!

applications/satellite-imagery-geotiff.ipynb Outdated Show resolved Hide resolved
dataframe.ipynb Show resolved Hide resolved
dataframes/01-data-access.ipynb Show resolved Hide resolved
@ian-r-rose
Copy link
Contributor

Generally, I like the idea of including everything that's needed in the conda environment file, but historically this has led to long binder spinup times that we'd like to avoid.

In my experience, actually pinning the dependencies to specific versions results in faster build times. In call cases, I think the spin-up times for pre-built images won't differ that much. So I'd be in favor of getting an environment that works, then going to a running binder and doing something like conda export --no-builds to make it more specific.

Add a new CI workflow to update all the dependencies and run the
notebooks with everything not pinned as the most recent verison.

Graphviz is available from conda-forge, so we do not need to install it
from apt.
This leads to warnings in the documentation builds about servers already
running, which doesn't look good.
The conda-lock specifications weren't working properly here. There was
also an irreconcilable conflict in the environment specification due to
the inclusion of pytorch and torchvision. These will have to be
installed separately.
@bryanwweber
Copy link
Contributor Author

@jrbourbeau @ian-r-rose CI finally all passed, so this is ready for another round of review

@jrbourbeau
Copy link
Member

CI finally all passed

Woo 🎉 I'll take a look now

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this @bryanwweber. Left a few comments, but overall this looks really close.

Also, can we confirm that that the changes here don't increase binder startup times? It sounds like it will actually decrease them, which would be great

pytest \
-n=auto \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this change was added? It looks like we're no longer building notebooks in parallel

Copy link
Contributor Author

@bryanwweber bryanwweber May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building notebooks in parallel puts warnings into the built docs that servers exist on 8787. I didn't think that was a good look, so I removed that for this build. The "update dependencies" build still goes in parallel for speed. See the commit message here: cc85142

Comment on lines +11 to +13
defaults:
run:
shell: bash -l {0}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for consolidating here

Comment on lines +21 to +22
# Increase this value to reset cache if binder/environment.yml has not changed
CACHE_NUMBER: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not used actions/cache before. Just to clarify, we mostly will never need to touch CACHE_NUMBER? Only in debugging cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right.

environment-file: binder/environment.yml
activate-environment: dask-examples
auto-activate-base: false
use-only-tar-bz2: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +42 to +51
- name: Execute Notebooks
run: |
pytest \
-vv \
-n=auto \
--forked \
--nbmake \
--overwrite \
--ignore machine-learning/torch-prediction.ipynb \
--ignore applications/json-data-on-the-web.ipynb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're now running all the notebooks twice. How does this differ from the other CI build where we execute the notebooks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now two environment files, environment.yml which has pinned versions created by mamba env export --no-builds, and environment-base.yml which only pins a few dependencies. This CI build uses environment-base.yml to create the environment, so it will (eventually) use updated versions of dependencies. The intention is to try to catch failures before users run into them.

@@ -0,0 +1,51 @@
name: Update Dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I could be missing something, but this name doesn't appear to reflect what happening in this workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yeah, "Update" as a verb is not appropriate... probably Test with Updated Dependencies

@@ -384,7 +384,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We encourage you to watch the [dashboard's status page](../proxy/8787/status) to watch on going computation."
"We encourage you to watch the [dashboard's status page](http://127.0.0.1:8787) to watch on going computation."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the previous link no longer work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, Sphinx warns that the file doesn't exist.

Comment on lines -27 to -31
"import subprocess\n",
"\n",
"subprocess.call([\n",
" \"pip\", \"install\", \"prophet Cython cmdstanpy==0.9.5 pystan numpy pandas matplotlib LunarCalendar convertdate holidays setuptools-git python-dateutil tqdm\"\n",
"])"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this go away : )

@@ -50,7 +50,7 @@

.. admonition:: Live Notebook

You can run this notebook in a `live session <https://mybinder.org/v2/gh/dask/dask-examples/main?urlpath=lab/tree/{{ docname }}>`_ |Binder| or view it `on Github <https://github.com/dask/dask-examples/blob/main/{{ docname }}>`_.
You can run this notebook in a `live session <https://mybinder.org/v2/gh/dask/dask-examples/main?urlpath=lab/tree/{{ docname }}>`__ |Binder| or view it `on Github <https://github.com/dask/dask-examples/blob/main/{{ docname }}>`__.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own understanding, why do we need a double underscore here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double underscore creates an anonymous link, rather than a link that can be reused. https://docutils.sourceforge.io/docs/user/rst/quickref.html#indirect-hyperlink-targets

Comment on lines +34 to +35
dataframes/03-from-pandas-to-dask
dataframes/04-reading-messy-data-into-dataframes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for adding these

@bryanwweber
Copy link
Contributor Author

bryanwweber commented May 5, 2022

Also, can we confirm that that the changes here don't increase binder startup times? It sounds like it will actually decrease them, which would be great

@jrbourbeau I'm not sure how I can check this before this PR is merged. Is there a link I can use somewhere?

@jrbourbeau
Copy link
Member

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried launching a binder instance using this PR and it took ~1.5 minutes to pull the (already built) image and launch the JupyterLab session. Unfortunately I've not been able to compare with what we currently have on examples.dask.org as every time I've launched a binder session the image has already been present on the binder machine instance, so no need to pull it, and things launch pretty much instantly. That said, ~1.5 minutes to pull the image that corresponds to this PR seems within reason for what I would call "normal" binder launch times, so let's go ahead and merge this in.

Thanks again @bryanwweber!

@jrbourbeau jrbourbeau merged commit 5a9bd2b into dask:main May 5, 2022
@bryanwweber bryanwweber deleted the fix-examples branch May 5, 2022 22:19
@jsignell jsignell mentioned this pull request Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update dependencies and ensure all notebooks are working
3 participants