Update dependencies and examples #218

bryanwweber · 2022-05-03T20:58:58Z

This updates all the pinned dependencies to the current version as of today, and fixes any examples that were broken by the updates.

Supersedes #215. Closes #206.

Important change: This also adds all of the dependencies that I noticed that were required to run the examples and removes any !pip install cells in the Notebooks. This makes the binder image larger, so this may not be desirable. Happy to revert that change.

cc/ @jrbourbeau @ian-r-rose

review-notebook-app · 2022-05-03T20:59:02Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

bryanwweber · 2022-05-03T21:34:30Z

dataframes/01-data-access.ipynb

@@ -11,27 +11,13 @@
  },


Line #2. client = Client(n_workers=1, threads_per_worker=4, processes=True, memory_limit='2GB')
This change is needed due to https://github.com/dask/dask/issues/8581

Reply via ReviewNB

jrbourbeau

Thanks for working on this @bryanwweber!

Important change: This also adds all of the dependencies that I noticed that were required to run the examples and removes any !pip install cells in the Notebooks. This makes the binder image larger, so this may not be desirable. Happy to revert that change.

How much longer does it take for the binder instances to spin up when including these dependencies? Generally, I like the idea of including everything that's needed in the conda environment file, but historically this has led to long binder spinup times that we'd like to avoid.

applications/async-web-server.ipynb

applications/satellite-imagery-geotiff.ipynb

ian-r-rose

Thanks for this @bryanwweber!

applications/satellite-imagery-geotiff.ipynb

dataframe.ipynb

dataframes/01-data-access.ipynb

ian-r-rose · 2022-05-04T13:36:20Z

Generally, I like the idea of including everything that's needed in the conda environment file, but historically this has led to long binder spinup times that we'd like to avoid.

In my experience, actually pinning the dependencies to specific versions results in faster build times. In call cases, I think the spin-up times for pre-built images won't differ that much. So I'd be in favor of getting an environment that works, then going to a running binder and doing something like conda export --no-builds to make it more specific.

Add a new CI workflow to update all the dependencies and run the notebooks with everything not pinned as the most recent verison. Graphviz is available from conda-forge, so we do not need to install it from apt.

This leads to warnings in the documentation builds about servers already running, which doesn't look good.

The conda-lock specifications weren't working properly here. There was also an irreconcilable conflict in the environment specification due to the inclusion of pytorch and torchvision. These will have to be installed separately.

Forgot to remove conda-lock

bryanwweber · 2022-05-04T21:36:41Z

@jrbourbeau @ian-r-rose CI finally all passed, so this is ready for another round of review

jrbourbeau · 2022-05-05T16:20:41Z

CI finally all passed

Woo 🎉 I'll take a look now

jrbourbeau

Thanks for all your work on this @bryanwweber. Left a few comments, but overall this looks really close.

Also, can we confirm that that the changes here don't increase binder startup times? It sounds like it will actually decrease them, which would be great

jrbourbeau · 2022-05-05T16:21:31Z

.github/workflows/ci-build.yaml

          pytest \
-            -n=auto \


I'm curious why this change was added? It looks like we're no longer building notebooks in parallel

Building notebooks in parallel puts warnings into the built docs that servers exist on 8787. I didn't think that was a good look, so I removed that for this build. The "update dependencies" build still goes in parallel for speed. See the commit message here: cc85142

jrbourbeau · 2022-05-05T16:22:05Z

.github/workflows/ci-build.yaml

+    defaults:
+      run:
+        shell: bash -l {0}


Nice, thanks for consolidating here

jrbourbeau · 2022-05-05T16:23:55Z

.github/workflows/ci-build.yaml

+          # Increase this value to reset cache if binder/environment.yml has not changed
+          CACHE_NUMBER: 0


I've not used actions/cache before. Just to clarify, we mostly will never need to touch CACHE_NUMBER? Only in debugging cases?

I think that's right.

jrbourbeau · 2022-05-05T16:27:37Z

.github/workflows/ci-build.yaml

          environment-file: binder/environment.yml
          activate-environment: dask-examples
          auto-activate-base: false
+          use-only-tar-bz2: true


Why is this needed? Looking at the conda docs, it appears to not be the preferred option https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#force-conda-to-download-only-tar-bz2-packages-use-only-tar-bz2

This is required for the cache to work, https://github.com/conda-incubator/setup-miniconda#caching-packages

jrbourbeau · 2022-05-05T16:29:58Z

.github/workflows/ci-update-dependencies.yaml

+      - name: Execute Notebooks
+        run: |
+          pytest \
+            -vv \
+            -n=auto \
+            --forked \
+            --nbmake \
+            --overwrite \
+            --ignore machine-learning/torch-prediction.ipynb \
+            --ignore applications/json-data-on-the-web.ipynb


It looks like we're now running all the notebooks twice. How does this differ from the other CI build where we execute the notebooks?

There are now two environment files, environment.yml which has pinned versions created by mamba env export --no-builds, and environment-base.yml which only pins a few dependencies. This CI build uses environment-base.yml to create the environment, so it will (eventually) use updated versions of dependencies. The intention is to try to catch failures before users run into them.

jrbourbeau · 2022-05-05T16:30:27Z

.github/workflows/ci-update-dependencies.yaml

@@ -0,0 +1,51 @@
+name: Update Dependencies


Hrm, I could be missing something, but this name doesn't appear to reflect what happening in this workflow

Well, yeah, "Update" as a verb is not appropriate... probably Test with Updated Dependencies

jrbourbeau · 2022-05-05T17:05:56Z

applications/embarrassingly-parallel.ipynb

@@ -384,7 +384,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We encourage you to watch the [dashboard's status page](../proxy/8787/status) to watch on going computation."
+    "We encourage you to watch the [dashboard's status page](http://127.0.0.1:8787) to watch on going computation."


Does the previous link no longer work?

I'm not sure, Sphinx warns that the file doesn't exist.

jrbourbeau · 2022-05-05T17:06:23Z

applications/forecasting-with-prophet.ipynb

-    "import subprocess\n",
-    "\n",
-    "subprocess.call([\n",
-    "    \"pip\", \"install\", \"prophet Cython cmdstanpy==0.9.5 pystan numpy pandas matplotlib LunarCalendar convertdate holidays setuptools-git python-dateutil tqdm\"\n",
-    "])"


Happy to see this go away : )

jrbourbeau · 2022-05-05T17:08:45Z

conf.py

@@ -50,7 +50,7 @@

 .. admonition:: Live Notebook

-    You can run this notebook in a `live session <https://mybinder.org/v2/gh/dask/dask-examples/main?urlpath=lab/tree/{{ docname }}>`_ |Binder| or view it `on Github <https://github.com/dask/dask-examples/blob/main/{{ docname }}>`_.
+    You can run this notebook in a `live session <https://mybinder.org/v2/gh/dask/dask-examples/main?urlpath=lab/tree/{{ docname }}>`__ |Binder| or view it `on Github <https://github.com/dask/dask-examples/blob/main/{{ docname }}>`__.


Just for my own understanding, why do we need a double underscore here?

The double underscore creates an anonymous link, rather than a link that can be reused. https://docutils.sourceforge.io/docs/user/rst/quickref.html#indirect-hyperlink-targets

jrbourbeau · 2022-05-05T17:10:52Z

index.rst

+   dataframes/03-from-pandas-to-dask
+   dataframes/04-reading-messy-data-into-dataframes


Nice! Thanks for adding these

bryanwweber · 2022-05-05T18:22:43Z

Also, can we confirm that that the changes here don't increase binder startup times? It sounds like it will actually decrease them, which would be great

@jrbourbeau I'm not sure how I can check this before this PR is merged. Is there a link I can use somewhere?

jrbourbeau · 2022-05-05T20:10:49Z

https://mybinder.org/v2/gh/bryanwweber/dask-examples/fix-examples?urlpath=lab

jrbourbeau

Just tried launching a binder instance using this PR and it took ~1.5 minutes to pull the (already built) image and launch the JupyterLab session. Unfortunately I've not been able to compare with what we currently have on examples.dask.org as every time I've launched a binder session the image has already been present on the binder machine instance, so no need to pull it, and things launch pretty much instantly. That said, ~1.5 minutes to pull the image that corresponds to this PR seems within reason for what I would call "normal" binder launch times, so let's go ahead and merge this in.

Thanks again @bryanwweber!

bryanwweber and others added 8 commits February 3, 2022 15:34

Ignore files generated from Notebooks

6d8e45b

Increase pytest verbosity

c3d6779

Fix dataframe.ipynb with newer Dask

f85c78f

Update dask and distributed version

e4253d4

Bump dask-ml

bf833eb

Bump scikit-learn

cf4ef07

Merge branch 'fix-examples' into update-dask-distributed

c853aaf

Update dependencies in binder environment file

3645b73

bryanwweber mentioned this pull request May 3, 2022

Update to JLab 3 #217

Merged

Update many examples

84c3252

bryanwweber commented May 3, 2022

View reviewed changes

jrbourbeau reviewed May 3, 2022

View reviewed changes

applications/async-web-server.ipynb Show resolved Hide resolved

applications/satellite-imagery-geotiff.ipynb Outdated Show resolved Hide resolved

ian-r-rose reviewed May 4, 2022

View reviewed changes

applications/satellite-imagery-geotiff.ipynb Outdated Show resolved Hide resolved

dataframe.ipynb Show resolved Hide resolved

dataframes/01-data-access.ipynb Show resolved Hide resolved

bryanwweber added 2 commits May 4, 2022 10:34

Merge branch 'main' into fix-examples

c1d1a65

Add a few more dependencies

9676d55

jrbourbeau mentioned this pull request May 4, 2022

Automatically clear notebook output #220

Closed

bryanwweber added 12 commits May 4, 2022 11:20

Set processes=True for Client

b5f5cd8

Clear all output from notebooks

7c7c13c

Remove Landsat example because data is not available

8cc3294

Fix Sphinx warnings in notebooks

3a69213

Use conda-lock to create environment.yml

428c2b3

Add a new CI workflow to update all the dependencies and run the notebooks with everything not pinned as the most recent verison. Graphviz is available from conda-forge, so we do not need to install it from apt.

Update default shell option in CI

ed5b36d

Use Mambaforge on CI

a796a24

Don't use pytest-xdist on CI

cc85142

This leads to warnings in the documentation builds about servers already running, which doesn't look good.

Fix Sphinx warnings from unlisted files

186d52b

Bump Python version for repo2docker CI job

19bf3c3

Update environment file without conda-lock

1060856

The conda-lock specifications weren't working properly here. There was also an irreconcilable conflict in the environment specification due to the inclusion of pytorch and torchvision. These will have to be installed separately.

Fix dependency update working directory

e894b9b

bryanwweber added 5 commits May 4, 2022 15:15

Fix dependency update job

f2a8f33

Forgot to remove conda-lock

Use conda package cache

bbd8d28

Cancel in-progress jobs if a new commit is pushed

8be496e

Fix missing python-graphviz dependency

0bc6350

Add graphviz to the main environment.yml as well

09660ce

More fixes for the Prefex example

09f4607

jrbourbeau reviewed May 5, 2022

View reviewed changes

jrbourbeau approved these changes May 5, 2022

View reviewed changes

jrbourbeau merged commit 5a9bd2b into dask:main May 5, 2022

This was referenced May 5, 2022

Update dask and distributed version #215

Closed

Resolve notebook CI failures #216

Closed

bryanwweber deleted the fix-examples branch May 5, 2022 22:19

jsignell mentioned this pull request Jul 27, 2022

Pin distributed #212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependencies and examples #218

Update dependencies and examples #218

bryanwweber commented May 3, 2022 •

edited

Loading

review-notebook-app bot commented May 3, 2022

bryanwweber May 3, 2022

jrbourbeau left a comment

ian-r-rose left a comment

ian-r-rose commented May 4, 2022

bryanwweber commented May 4, 2022

jrbourbeau commented May 5, 2022

jrbourbeau left a comment

jrbourbeau May 5, 2022

bryanwweber May 5, 2022 •

edited

Loading

jrbourbeau May 5, 2022

jrbourbeau May 5, 2022

bryanwweber May 5, 2022

jrbourbeau May 5, 2022

bryanwweber May 5, 2022

jrbourbeau May 5, 2022

bryanwweber May 5, 2022

jrbourbeau May 5, 2022

bryanwweber May 5, 2022

jrbourbeau May 5, 2022

bryanwweber May 5, 2022

jrbourbeau May 5, 2022

jrbourbeau May 5, 2022

bryanwweber May 5, 2022

jrbourbeau May 5, 2022

bryanwweber commented May 5, 2022 •

edited

Loading

jrbourbeau commented May 5, 2022

jrbourbeau left a comment

		# Increase this value to reset cache if binder/environment.yml has not changed
		CACHE_NUMBER: 0

		dataframes/03-from-pandas-to-dask
		dataframes/04-reading-messy-data-into-dataframes

Update dependencies and examples #218

Update dependencies and examples #218

Conversation

bryanwweber commented May 3, 2022 • edited Loading

review-notebook-app bot commented May 3, 2022

Choose a reason for hiding this comment

jrbourbeau left a comment

Choose a reason for hiding this comment

ian-r-rose left a comment

Choose a reason for hiding this comment

ian-r-rose commented May 4, 2022

bryanwweber commented May 4, 2022

jrbourbeau commented May 5, 2022

jrbourbeau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryanwweber May 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryanwweber commented May 5, 2022 • edited Loading

jrbourbeau commented May 5, 2022

jrbourbeau left a comment

Choose a reason for hiding this comment

bryanwweber commented May 3, 2022 •

edited

Loading

bryanwweber May 5, 2022 •

edited

Loading

bryanwweber commented May 5, 2022 •

edited

Loading