Examples page #7

Yoshanuikabundi · 2023-05-31T03:42:15Z

Progress

Goals and motivation

This is going to be a one-stop shop for all the examples, tutorials, and cookbooks in the OpenFF world.

Rendered

Notes for discussion about how to design this:

I would like:

All the examples in one place
A place for examples that don't fit neatly into a project
Searchable, easy, obvious navigation of examples
Easy user access to...
- Dependency installation for notebooks
- Colab with dependency installation cell
- Binder with dependencies installed?
- Downloading a zip of a notebook and its associated files
- A fully-rendered and executed HTML version of each notebook

Maybe it would be nice if:

We separated tutorials from cookbook examples? This seems to be a common framing in other projects, and lots of people I talked to at the conference brought up the idea. The distinction is that tutorials teach you how to do something, and a cookbook just gives you fully working code to copy and paste.

Possibly surprising things we can do:

Store arbitrary metadata in notebooks (as long as you can de/serialize it to text)
Store config files and readmes and environments alongside notebooks
Arbitrary code in openff-docs sphinx extension
Integrate sphinx extensions into openff-sphinx-theme
Store and access arbitrary files on RTD
Store and access arbitrary files in a branch a la gh-pages (gross but possibly necessary for Colab)
Take a base notebook from a project repository and inject purpose-built cells into it (compare the first cells of the notebook prepared for Colab (conda-colab installation), the rendered notebook (links to different versions of the notebook), and the source notebook (neither of the above). Unfortunately the Colab link in the rendered notebook doesn't yet use this prepared notebook because Colab can only take notebooks from GitHub, but you can see it working in the OpenMM cookbook because they use GH Pages - click "Open in Google Colab")

Constraints:

Colab can only load notebooks from GitHub, so if we want to inject a dependency installation cell we need to push to a repo
If we want to store notebooks in source repositories without output, we need to execute them when we build the docs to include output in the docs (and get thumbnails automatically)
Docs need to be rebuilt whenever the examples change in source repositories
Detecting changes in examples, storing different versions of notebooks, etc requires holding state somewhere

We need to standardize and make explicit:

Thankfully most of these are already broadly the same across repositories

How the examples folder is laid out and where the notebooks go
Where files that a notebook relies on go
Which dependencies are used in notebooks, or at least how to find out which dependencies are used

I recommend something like:

devtools/conda-envs/examples.yaml
examples/
    deprecated/
        <notebooks_that_are_ignored>
    experimental/
        <notebooks_that_are_presented_but_experimental>
    <notebook_name>/
        <notebook_name>.ipynb
        <file_notebook_needs>
        <other_file_notebook_needs>
    <another_notebook_name>/
        <another_notebook_name>.ipynb
        thumbnail.png
    <notebook_without_files>.ipynb

This implies using the same environment for testing notebooks in CI as for running notebooks as a user, which is not current practice at least in the Toolkit. This means users might get a few dependencies they don't need, but guarantees the right dependencies are being tested.

We need to decide, in broad strokes

Some dev experience design stuff:

Whether we want openff-docs to automatically include new notebooks
Whether we want openff-docs to automatically update existing notebooks
Whether we want openff-docs to automatically remove old notebooks
How the examples page layout is specified
- Notebook metadata?
  - Tags?
  - Categories?
  - Full on navigation tree?
- Regular old Sphinx MarkDown/ReST?

There's basically two extremes here, and intermediate states are possible: One is just do all the layout in MarkDown and have the Sphinx extension only take care of downloading and processing the notebooks (what OpenMM Cookbook does, means that new examples only show up with a PR to openff-docs), and the other is to have all the layout information in the notebook metadata and have the Sphinx extension take care of everything (a very basic version of which is currently implemented in this PR).

And some technical stuff:

Whether we want openff-docs to execute notebooks (and how to cache this)
Where modified/executed notebooks live

I have some ideas that I hate:

Store all the processed notebooks in main and keep them up to date with CI
Process notebooks with CI into a branch, then pull that branch in when RTD gets built (I think this is my fave option)
Do the examples stuff in a separate repo, keep them on GitHub Pages, and just link them from here (domain names might be tricky, and it'll fracture the sidebar, but lots of flexibility with rebuilds)
Spend hours executing notebooks every time RTD builds (this also doesn't give us Colab-specific dependency installation cells)

We need to change in existing examples

Everything needs a (short) title
Metadata probably needs to be added
Thumbnails?
Fix broken links
We may want to remove existing in-notebook installation instructions and rely on injected cells for that

j-wags · 2023-06-01T00:22:02Z

2023_05_31 JM/MT/JW meeting notes

Colab can only load notebooks from GitHub, so if we want to inject a dependency installation cell we need to push to a repo

We'll assume that the solution will include re-running the raw source notebooks to generate output (since we can't assume that notebooks in different repos will have their outputs stored.

Docs need to be rebuilt whenever the examples change in source repositories
Detecting changes in examples, storing different versions of notebooks, etc requires holding state somewhere

We'll have automation that checks for ~~changes~~ new releases and re-executes if the source notebooks change. The state will be stored in a branch that looks nothing like the other branches, kinda like a gh-pages situation.

Which dependencies are used in notebooks?

MT in favor of envs that install relatively quickly so people aren't waiting a long time for colab to start
JW in favor of maximalist environment, could use something like single file installer to "cache"

Result: Undecided. We'll start with a maximalist-ish solution for first iteration.

Constraints:

Colab can only load notebooks from GitHub, so if we want to inject a dependency installation cell we need to push to a repo
If we want to store notebooks in source repositories without output, we need to execute them when we build the docs to include output in the docs (and get thumbnails automatically)
Docs need to be rebuilt whenever the examples change in source repositories
Detecting changes in examples, storing different versions of notebooks, etc requires holding state somewhere

JW -- Existing constraints look good, though I don't like the expectation that folder and notebook name should be the same (means only one notebook per folder, which isn't necessarily a pattern I can commit to remembering/doesn't seem necessary)
MT -- This looks reasonable.

Organization

MT + JW -- Each notebook folder will have a thumbnail.png.

We'll automatically include new notebooks, automatically update existing notebooks, automatically remove notebooks, and organize the notebooks with categories. We'll base these updates on when things get released.

We'll tentatively assign each notebook to one or more categories. JM will come up with a tentative list of categories which the package maintainers will assign to each of their notebooks.

The titles will be auto-harvested from the top-level header. Some of these will need to be abbreviated since they're currently quite long.

We need to change in existing examples

Everything needs a title
Metadata probably needs to be added
Thumbnails?
Fix broken links
We may want to remove existing in-notebook installation instructions and rely on injected cells for that

JW + MT -- Agree (except JW may have a little technical trouble with complying with the final one, though if this replaces the Toolkit's need to support cloud-runnable notebooks maybe it'll be fine)

JM -- Should I copy all this stuff in each repo?

MT + JW -- Should centralize this in openff-docs. No benefit to copying it out.

JM -- Separate cookbooks and tutorials? People said really nice things about cookbooks at the in-person meeting so maybe they get their own category.

Undecided, will revisit later.

Yoshanuikabundi · 2023-06-06T13:16:52Z

@mattwthompson I'm curious how you would like me to handle INTERCHANGE_EXPERIMENTAL? I tried executing all the notebooks and a bunch of them failed because the environment variable wasn't set. I'm thinking I just set the environment variable when I do the executing (behind closed doors) and let users discover it on their own interactively? It'll be disruptive for users on Colab but I guess that's kinda the point.

Also my computer can run all of the examples (in parallel) in like 5 minutes. Makes the twenty minute execution times in the Toolkit CI pretty frustrating.

Also this seems to be the maximalist environment, for future reference:

channels:
    - conda-forge
    - bioconda
dependencies:
    - pip
    - python=3.10
    # Cookbook
    - gitpython
    - nbconvert
    - nbformat
    # Examples
    - openff-toolkit-examples
    - gromacs
    - lammps
    - rich
    - jax

This seems to be going too well, I should check if the notebooks are kicking up exceptions that are getting happily baked into the executed notebooks...

mattwthompson · 2023-06-06T14:10:17Z

source/_ext/cookbook/proc_examples.py

+    # Execute notebooks in parallel for rendering as HTML
+    if do_exec:
+        # Context manager ensures the pool is correctly terminated if there's
+        # an exception
+        with Pool() as pool:
+            # Wait a second between launching subprocesses
+            # Workaround https://github.com/jupyter/nbconvert/issues/1066
+            _ = [*pool.imap_unordered(execute_notebook, delay_iterator(notebooks))]


Nice, I've always wanted to be able to do this in tests but I couldn't figure out how to get nbval to do it. I mean, I guess it's possible and I just never found out how. Still have to turn off pytest-randomly, probably

Yeah figuring this out was not fun. There's a race condition in nbconvert (which executes the notebooks) so if you launch too many notebook kernels too quickly you sometimes get two of them on the same port. But fixing each problem that came up in the dumbest way possible seemed to work eventually!

mattwthompson · 2023-06-06T14:13:49Z

I just set it in my action: https://github.com/openforcefield/openff-interchange/blob/v0.3.4/.github/workflows/examples.yaml#L34

Using the %env INTERCHANGE_EXPERIMENTAL=1 magic should work for local execution and CI, and it should make Colab work as well, if that's desired. I haven't thought through whether the experimental examples should work on Colab out of the box or if users should need to find out how to set the magic themselves...

mattwthompson · 2023-06-06T16:32:32Z

I'm moving around some files in Interchange; shouldn't affect things too much and doesn't need to be the last change. Just FYI: openforcefield/openff-interchange#740

The modified JS file introduced in this commit is from: nglviewer/nglview#1064 Once that PR makes it into a release, the logic and JavaScript added in this commit should be replaced with a pin to the appropriate version of NGLView.

Yoshanuikabundi · 2023-06-20T14:58:34Z

OK. Now that I've been working on this for 3 weeks (:scream:), there are some things that I should note.

We agreed at our meeting that we only want to update the examples in this repository when a project makes a release. This is actually a super important fact that it's taken me a while to figure out the full implications of...

Since releases are rare (compared to, say, PRs or days) and examples have to be fast enough to run in project CI, we can afford to just re-execute every notebook every release and not worry about diffing things - this'll only take like an hour or something per release and massively simplify everything
As an added benefit, we get a guarantee that if output changes in a release that will be reflected in the rendered notebook
It's an open question if we should re-render every notebook every release, or just the notebooks from the updated project
This will mean that the branch where we're storing the executed state and colab packages and everything will have a commit per release for each of our projects that have examples... which seems like it could be valuable?
Given that we're going for a universal environment, it also means that at every release (from now) of every package (with examples) we will automatically generate a fully-specified Conda environment that includes ALL OpenFF software that people are likely to use to run simulations (except maybe BespokeFit for now), as well as all the software they'll want to use with it
We're going to want to spruce up the source examples before this PR is done so that the spruced-up examples are in a release when it goes live (thumbnails, prose, titles, etc)
NGLView does work in the rendered notebooks, but its a bit fiddly and we might have some best-practices for how to use it in examples
- nglview.show_file() does the file parsing in JS, so it probably won't work on local files
- Trajectories don't work yet (and may never work)
- Run all archived messages when embedded widget is uninitialized nglviewer/nglview#1064 (!!!)

I'm hoping to get the GitHub action that does the execution and pre-processing written by the end of the week... after which point this should come together very quickly. I'm thinking I'll do it in a separate PR so we can make sure it works on a release or two before merging this one and making its results visible.

mattwthompson · 2023-06-20T15:49:32Z

It's an open question if we should re-render every notebook every release, or just the notebooks from the updated project

Weakly-held opinion is that re-running everything everytime anything gets a bump seems excessive but generally keeping things up-to-date is nice. I came across some notebooks in the toolkit that haven't been updated since openforcefield/openff-toolkit#1426, which is not super recent.

Yoshanuikabundi · 2023-06-20T16:29:11Z

Which dependencies are used in notebooks?

MT in favor of envs that install relatively quickly so people aren't waiting a long time for colab to start

JW in favor of maximalist environment, could use something like single file installer to "cache"

Result: Undecided. We'll start with a maximalist-ish solution for first iteration.

The current implementation runs the notebooks for rendering as web pages from a maximalist environment stored in this repo, but allows source repos to override it for the Colab link and zip download. If the source repo doesn't provide an environment for a notebook, Colab and the zip gets the maximalist environment. So zippy examples can include minimalist environments where it makes a difference. I don't have handling of source repo-wide environments - it's either the openff-docs example environment, or a notebook environment.

Running each notebook in its own environment for the web page rendering will be slow and difficult, so I don't want to do it. On the other hand, if a release includes an example that doesn't work with the openff-docs example environment, a quick PR to update that environment should be enough to get the release working again.

Yoshanuikabundi · 2023-07-04T12:08:29Z

@j-wags @mattwthompson @lilyminium I think this is ready to merge! I would appreciate some clicking around and checking that everything works for everyone else, and a second pair of eyes on whatever you all have time for, and then I'll merge tomorrow unless something comes up.

@j-wags I haven't created a quarterly calendar event to clean up the cache because I've added an action to clean up PR folders in the cache when the PR gets merged. Combined with the fact that I've updated the action not to store histories, this should keep everything tidy. I'll test the cleanup works when this PR gets merged :/

The only automation for regenerating the cache is the scheduled nightly one; if the cache needs to be regenerated at any other time (for example, while developing a PR), then it has to be done manually. Instructions for how to do that should be in a comment on any new PR - I'll test that when I open the PR to add QCSubmit and Fragmenter. I just really didn't want to wait 30 minutes every time I make a change to any part of the repo; regenerating the cache should be relatively rare. There's also no way to trigger an RTD PR build in a GitHub action, so if the cache was automatically regenerated in PRs, you'd have to manually trigger that once its done.

@lilyminium I still have NAGL 0.2.2 pinned here, because of that versioneer issue I raised at NAGL. To unpin, that needs to be fixed, as that's how the cookbook knows which tag/branch to get the example notebooks from - the environment file is solved by Mamba, and then the tag corresponding to the installed version is cloned to get the notebooks. Updating by hand if the next release doesn't fix that issue should be fine; the version needs to be updated in both the examples conda environment and the globals_.py module. I'm happy to quickly take you through how this all works if you want to know and don't wanna read this entire thread! Let me know.

Once this is merged I'll write up some documentation on how it works so that it can be maintained/updated while I'm at OpenFE, and I'll open that PR for QCSubmit/Fragmenter.

mattwthompson · 2023-07-04T20:00:08Z

I spent a small amount of time poking through things and didn't observe anything notably violent. In fact things seem pretty good - pages load snappily, the content I expect to be there is there, and even the 3D renders work great!

If there was one thing I could suggest as an improvement, it'd be more thumbnails. I'm partially responsible for this, so ....

One of the NAGL examples ("Prepare a dataset for training") should probably be renamed to include GCNN/NAGL in the title; when I read it for the first time I didn't know what it was about and a new user might confuse it with QC/physical property dataset curation.

Some of the cell outputs could do with pretty-ification, like this one which is a fair amount of information in a small number of wrapped black-and-white lines. Out of scope here but might be worth exploring?

It'd be a luxury to suppress warnings generated by the runner, like this one:

I could spend an hour or two going through things in great detail, the outcome of which would mostly be me wanting to re-write half of the examples and nothing to do with the automation that generates these webpages or dancing around the edges like earlier in this comment. So I think the plan to merge roughly as-is in a day or two is great.

Yoshanuikabundi · 2023-07-05T04:14:27Z

If there was one thing I could suggest as an improvement, it'd be more thumbnails.

Agreed! Thumbnails are super time consuming, happen in the source repositories, and like all updates from source repos require a release to be updated, so hopefully if we all chip in a thumbnail or two when we have a moment of inspiration we can fix this over time.

One of the NAGL examples ("Prepare a dataset for training") should probably be renamed to include GCNN/NAGL in the title

Also agreed! This will be fixed in the next release of NAGL

Some of the cell outputs could do with pretty-ification, like this one which is a fair amount of information in a small number of wrapped black-and-white lines. Out of scope here but might be worth exploring?

Definitely worth exploring!

It'd be a luxury to suppress warnings generated by the runner

I think we could just hide STDERR, but I'm not sure it's a good idea because I'd like users to know that they're not doing anything wrong if they get a warning when they run it themselves. I think the ideal resolution for each warning, in order of preference, is:

Fix the warning (possibly just by tweaking the environment in some cases)
Tell the user about the warning in the notebook (preferably including why it can't be fixed/when it will be fixed)
Explicitly silence warnings in the notebook where appropriate (so users understand what's going on)

My priority here is helping users understand what's going on, and helping to make outputs reproducible between rendering and running. If it pushes us to fix warnings or find pathways to avoid warnings, so much the better. So again, hopefully we can improve this over time.

I could spend an hour or two going through things in great detail, the outcome of which would mostly be me wanting to re-write half of the examples and nothing to do with the automation that generates these webpages or dancing around the edges like earlier in this comment.

I think that's my assessment too - plenty of improvements to make in examples.

So I think the plan to merge roughly as-is in a day or two is great.

Woo!

Yoshanuikabundi added 6 commits May 30, 2023 17:38

Add openmm cookbook

b1d05ba

Examples progress

6ee79d7

Remove sphinx_gallery

413c1a7

Add gitpython to lint environment

75f45dc

Remove nbsphinx_prolog

45e6bd3

Download zip files with notebooks and files, not just notebooks

d8f197e

Yoshanuikabundi added 2 commits June 6, 2023 18:59

Pre-processing notebooks basically works

cd59e3e

Execute notebooks in parallel and only update changed notebooks

7479768

Yoshanuikabundi added 3 commits June 6, 2023 23:46

Set INTERCHANGE_EXPERIMENTAL and simplify delay code

08dfb53

Notes to self

2bacdde

Interim fixes for extension

d773f31

mattwthompson reviewed Jun 6, 2023

View reviewed changes

Yoshanuikabundi added 9 commits June 7, 2023 15:51

Fix module stuff

7926fb4

Clean up removed notebooks and download latest release examples

96dbb8f

Get the sphinx extension building the processed stuff

a214387

Proof of concept gallery

2c92744

Simplify code and fix warnings

975fa72

Fix css in cookbook

ac0a263

Finish transition to MyST-NB

79f2e8a

Temporary fix for NGLView widgets.

97a2705

The modified JS file introduced in this commit is from: nglviewer/nglview#1064 Once that PR makes it into a release, the logic and JavaScript added in this commit should be replaced with a pin to the appropriate version of NGLView.

Process all notebooks in single folder

7215221

Refactor towards caching notebooks on Git branch

b7a46e1

Avoid unnecessary API requests AKA Generators are cool & I am a nerd

7fbaac3

Yoshanuikabundi added 8 commits July 3, 2023 23:25

Swap quotes

c17a1c7

Try backtick shell expansion

b155ab9

Use shell logic in .readthedocs.yaml

56fad31

Clean up PR cache when closed

139b9a4

Trigger RTD build when cache is updated (except in PRs)

7d43834

Merge branch 'main' into examples

0c28089

Clarify description

f60ed11

Single quotes

7143205

Yoshanuikabundi mentioned this pull request Jul 4, 2023

Prepare notebooks for central examples openforcefield/openff-nagl#40

Merged

1 task

Yoshanuikabundi added 12 commits July 4, 2023 13:23

Show source repository in gallery

fa10eee

Style top-of-notebook links

b45fdb2

Add experimental warnings and improve styling

90effe8

Newline in css

3ed8f85

globals -> globals_

33b77bb

Switch run_notebook.sh to use jupyterlab

8ed4661

Add installation instructions

0689914

zip -> tgz

a2b33a9

Add CLI example for run_notebook.sh

3c53bfe

Simplify

52d05dc

Fix old imports

457ae4b

More robust run_notebook.sh

0350a44

Yoshanuikabundi force-pushed the examples branch from 8c14489 to 0350a44 Compare July 4, 2023 08:37

Yoshanuikabundi added 2 commits July 4, 2023 20:44

Migrate to setup-micromamba

dfcc8de

ZIP -> .TGZ in install instructions

708dc2e

Yoshanuikabundi marked this pull request as ready for review July 4, 2023 13:35

Yoshanuikabundi merged commit 8b3f64e into main Jul 5, 2023

Yoshanuikabundi changed the title ~~[DNM] Examples page~~ Examples page Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples page #7

Examples page #7

Yoshanuikabundi commented May 31, 2023 •

edited

Loading

j-wags commented Jun 1, 2023

Yoshanuikabundi commented Jun 6, 2023

mattwthompson Jun 6, 2023

Yoshanuikabundi Jun 21, 2023

mattwthompson commented Jun 6, 2023

mattwthompson commented Jun 6, 2023

Yoshanuikabundi commented Jun 20, 2023

mattwthompson commented Jun 20, 2023

Yoshanuikabundi commented Jun 20, 2023 •

edited

Loading

Yoshanuikabundi commented Jul 4, 2023

mattwthompson commented Jul 4, 2023

Yoshanuikabundi commented Jul 5, 2023

Examples page #7

Examples page #7

Conversation

Yoshanuikabundi commented May 31, 2023 • edited Loading

Progress

Goals and motivation

j-wags commented Jun 1, 2023

Yoshanuikabundi commented Jun 6, 2023

mattwthompson Jun 6, 2023

Choose a reason for hiding this comment

Yoshanuikabundi Jun 21, 2023

Choose a reason for hiding this comment

mattwthompson commented Jun 6, 2023

mattwthompson commented Jun 6, 2023

Yoshanuikabundi commented Jun 20, 2023

mattwthompson commented Jun 20, 2023

Yoshanuikabundi commented Jun 20, 2023 • edited Loading

Yoshanuikabundi commented Jul 4, 2023

mattwthompson commented Jul 4, 2023

Yoshanuikabundi commented Jul 5, 2023

Yoshanuikabundi commented May 31, 2023 •

edited

Loading

Yoshanuikabundi commented Jun 20, 2023 •

edited

Loading