-
-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dependencies and examples #218
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -11,27 +11,13 @@ | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #2. client = Client(n_workers=1, threads_per_worker=4, processes=True, memory_limit='2GB')
This change is needed due to https://github.com/dask/dask/issues/8581
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @bryanwweber!
Important change: This also adds all of the dependencies that I noticed that were required to run the examples and removes any !pip install cells in the Notebooks. This makes the binder image larger, so this may not be desirable. Happy to revert that change.
How much longer does it take for the binder instances to spin up when including these dependencies? Generally, I like the idea of including everything that's needed in the conda environment file, but historically this has led to long binder spinup times that we'd like to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @bryanwweber!
In my experience, actually pinning the dependencies to specific versions results in faster build times. In call cases, I think the spin-up times for pre-built images won't differ that much. So I'd be in favor of getting an environment that works, then going to a running binder and doing something like |
Add a new CI workflow to update all the dependencies and run the notebooks with everything not pinned as the most recent verison. Graphviz is available from conda-forge, so we do not need to install it from apt.
This leads to warnings in the documentation builds about servers already running, which doesn't look good.
The conda-lock specifications weren't working properly here. There was also an irreconcilable conflict in the environment specification due to the inclusion of pytorch and torchvision. These will have to be installed separately.
@jrbourbeau @ian-r-rose CI finally all passed, so this is ready for another round of review |
Woo 🎉 I'll take a look now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work on this @bryanwweber. Left a few comments, but overall this looks really close.
Also, can we confirm that that the changes here don't increase binder startup times? It sounds like it will actually decrease them, which would be great
pytest \ | ||
-n=auto \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this change was added? It looks like we're no longer building notebooks in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building notebooks in parallel puts warnings into the built docs that servers exist on 8787. I didn't think that was a good look, so I removed that for this build. The "update dependencies" build still goes in parallel for speed. See the commit message here: cc85142
defaults: | ||
run: | ||
shell: bash -l {0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for consolidating here
# Increase this value to reset cache if binder/environment.yml has not changed | ||
CACHE_NUMBER: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not used actions/cache
before. Just to clarify, we mostly will never need to touch CACHE_NUMBER
? Only in debugging cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's right.
environment-file: binder/environment.yml | ||
activate-environment: dask-examples | ||
auto-activate-base: false | ||
use-only-tar-bz2: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Looking at the conda docs, it appears to not be the preferred option https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#force-conda-to-download-only-tar-bz2-packages-use-only-tar-bz2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required for the cache to work, https://github.com/conda-incubator/setup-miniconda#caching-packages
- name: Execute Notebooks | ||
run: | | ||
pytest \ | ||
-vv \ | ||
-n=auto \ | ||
--forked \ | ||
--nbmake \ | ||
--overwrite \ | ||
--ignore machine-learning/torch-prediction.ipynb \ | ||
--ignore applications/json-data-on-the-web.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're now running all the notebooks twice. How does this differ from the other CI build where we execute the notebooks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are now two environment files, environment.yml
which has pinned versions created by mamba env export --no-builds
, and environment-base.yml
which only pins a few dependencies. This CI build uses environment-base.yml
to create the environment, so it will (eventually) use updated versions of dependencies. The intention is to try to catch failures before users run into them.
@@ -0,0 +1,51 @@ | |||
name: Update Dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I could be missing something, but this name doesn't appear to reflect what happening in this workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yeah, "Update" as a verb is not appropriate... probably Test with Updated Dependencies
@@ -384,7 +384,7 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"We encourage you to watch the [dashboard's status page](../proxy/8787/status) to watch on going computation." | |||
"We encourage you to watch the [dashboard's status page](http://127.0.0.1:8787) to watch on going computation." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the previous link no longer work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, Sphinx warns that the file doesn't exist.
"import subprocess\n", | ||
"\n", | ||
"subprocess.call([\n", | ||
" \"pip\", \"install\", \"prophet Cython cmdstanpy==0.9.5 pystan numpy pandas matplotlib LunarCalendar convertdate holidays setuptools-git python-dateutil tqdm\"\n", | ||
"])" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see this go away : )
@@ -50,7 +50,7 @@ | |||
|
|||
.. admonition:: Live Notebook | |||
|
|||
You can run this notebook in a `live session <https://mybinder.org/v2/gh/dask/dask-examples/main?urlpath=lab/tree/{{ docname }}>`_ |Binder| or view it `on Github <https://github.com/dask/dask-examples/blob/main/{{ docname }}>`_. | |||
You can run this notebook in a `live session <https://mybinder.org/v2/gh/dask/dask-examples/main?urlpath=lab/tree/{{ docname }}>`__ |Binder| or view it `on Github <https://github.com/dask/dask-examples/blob/main/{{ docname }}>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own understanding, why do we need a double underscore here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double underscore creates an anonymous link, rather than a link that can be reused. https://docutils.sourceforge.io/docs/user/rst/quickref.html#indirect-hyperlink-targets
dataframes/03-from-pandas-to-dask | ||
dataframes/04-reading-messy-data-into-dataframes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for adding these
@jrbourbeau I'm not sure how I can check this before this PR is merged. Is there a link I can use somewhere? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried launching a binder instance using this PR and it took ~1.5 minutes to pull the (already built) image and launch the JupyterLab session. Unfortunately I've not been able to compare with what we currently have on examples.dask.org as every time I've launched a binder session the image has already been present on the binder machine instance, so no need to pull it, and things launch pretty much instantly. That said, ~1.5 minutes to pull the image that corresponds to this PR seems within reason for what I would call "normal" binder launch times, so let's go ahead and merge this in.
Thanks again @bryanwweber!
This updates all the pinned dependencies to the current version as of today, and fixes any examples that were broken by the updates.
Supersedes #215. Closes #206.
Important change: This also adds all of the dependencies that I noticed that were required to run the examples and removes any
!pip install
cells in the Notebooks. This makes the binder image larger, so this may not be desirable. Happy to revert that change.cc/ @jrbourbeau @ian-r-rose