Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: package API/CLI/documentation #74

Merged
merged 40 commits into from
Jan 25, 2022
Merged

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Aug 2, 2021

This PR re-writes key parts of the package (a) to add additional functionality, and (b) with a view to eventually exposing this CLI in https://jupyterbook.org/.
Key changes:

  1. stage/staging is now rephrased to notebook, plus the addition of project, i.e. you add notebooks to a project, then execute them
  2. notebook read_data is specified per notebook in the project, allowing for multiple types of file to be read/executed via the CLI (e.g. MyST Markdown files via jupytext). Before, the read functions were passed directly to the API methods.
  3. The executor can be specified with jbcache execute --executor, and a parallel notebook executor has been added.
  4. Improved execution status indicator in jbcache project list and othe CLI improvements
  5. Re-write of documentation, including better front page, with quick start guide and better logo.

Rather than passing an optional `converter` to methods,
we now store staged files with a specific reader key.
The key relates to an entry-point (in group `jcache.readers`) of dynamically loaded reader.

Also, the `jupyter_executors` entry group has been changed to `jcache.executors`,
and `importlib-metadata` is used to load entry points.
@codecov
Copy link

codecov bot commented Aug 2, 2021

Codecov Report

Merging #74 (1d3fd4b) into master (7917c68) will increase coverage by 1.51%.
The diff coverage is 77.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #74      +/-   ##
==========================================
+ Coverage   81.33%   82.85%   +1.51%     
==========================================
  Files          17       20       +3     
  Lines        1045     1318     +273     
==========================================
+ Hits          850     1092     +242     
- Misses        195      226      +31     
Flag Coverage Δ
pytests 82.85% <77.63%> (+1.51%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jupyter_cache/cli/commands/cmd_main.py 100.00% <ø> (ø)
jupyter_cache/cli/utils.py 56.25% <56.25%> (ø)
jupyter_cache/cli/commands/cmd_project.py 64.91% <64.91%> (ø)
jupyter_cache/cli/commands/cmd_cache.py 72.66% <65.78%> (-2.73%) ⬇️
jupyter_cache/utils.py 87.93% <66.66%> (-6.07%) ⬇️
jupyter_cache/cli/__init__.py 72.00% <72.00%> (ø)
jupyter_cache/entry_points.py 75.00% <75.00%> (ø)
jupyter_cache/cli/commands/cmd_notebook.py 76.19% <76.19%> (ø)
jupyter_cache/cache/main.py 88.10% <76.78%> (+1.43%) ⬆️
jupyter_cache/cache/db.py 85.33% <78.51%> (-1.59%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7917c68...1d3fd4b. Read the comment docs.

It was felt that this is conceptually easier to understand,
i.e. it is a list of records for each notebook (& associated data) in the project,
rather than just a staging area for pre-executed notebooks.
For `jcache project` commands, and remove `--all` option,
in favour of separate `jcache project clear`
To write out a notebook merging the project file with its cached outputs.
The execution logic was also refactored, to reduce code duplication.

Note, artefact retrieval has been removed for now,
until the logic can be improved.
@chrisjsewell chrisjsewell changed the title 🔀 MERGE: Improve notebook execution 🔀 MERGE: Refactor package Aug 4, 2021
@chrisjsewell chrisjsewell requested a review from choldgraf August 4, 2021 06:31
@chrisjsewell
Copy link
Member Author

@choldgraf there is some more I probably want to do here, but it would be good if you could have a skim of the new documentation and give some feedback ta

@choldgraf choldgraf self-assigned this Aug 4, 2021
@choldgraf
Copy link
Member

Cool, I'll assign myself so I remember

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Aug 4, 2021

Some notes on possible TODOs

jbcache project:

  • allow for different kernel to one specified in notebook (see Expose kernel_name of nbclient.nbexecute() #63)
  • retain record of last executed cache record and allow to diff against it
  • remove related cache record (e.g. jbcache project invalidate <pk/uri>)
  • allow to infer reader from extension when adding files
  • disable specific files (so they are not re-executed, e.g. jbcache project enable/disable <pk/uri>)
  • allow to add assets to existing file without having to remove then re-add it
  • include assets in hash?
  • storing full failed notebooks somewhere to look at, on top of just the traceback

docs:

  • Add API/click autodoc

myst-nb / jupyter-book integration

  • how to make default path to cache the same one that they use
  • how to automatically load custom notebook readers (or at least allow to skip files with unknown readers when calling jbcache execute rather than excepting)

other:

  • add mypy type checking
  • more jupytext integration

@chrisjsewell chrisjsewell requested a review from mmcky August 4, 2021 22:16
@chrisjsewell
Copy link
Member Author

Related to this, I also just opened jupyter/nbclient#151

@mmcky
Copy link
Member

mmcky commented Aug 5, 2021

@chrisjsewell these improvements look great. I particularly like the new terminology with project vs staged etc.

One question I had in relation to the executor is do you think it might be a good time to add optional dependencies for execution. In jupinx we built in the ability to say that a lecture or page needed to be executed only after another one has already been executed. This enabled us to reuse files or outputs from one lecture in another.

I don't think is a high priority issue but it is a nice to have feature.

Copy link
Contributor

@akhmerov akhmerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this, I also just opened jupyter/nbclient#151

At a glance, that approach seems to fail with the way hash keys are implemented now. Imagine the user modifies a cell with skip-execution tag applied. This should definitely result in the same key. Now, however, all code cells are used to compute the hash key.


Also parallel execution is amazing! I'd love to cut down on those 1 hour build times.

Copy link
Member

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice re-work! I focused on the documentation in this review - in general I think it's good and does a nice job of explaining the end-to-end functionality of the Python API and the CLI. My main questions and comments were around nomenclature and making sure that some of the ideas are explained in a clear way. I tried to note where I was a bit confused, as presumably this is where others will be confused as well! Happy to take another look if you make some changes!

shutil.rmtree(path)
return []

class JcacheCli(SphinxDirective):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide comments for what these classes do so that others have extra context? I guess it's a developer-friendly tool for these docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added docstring

docs/develop/contributing.md Outdated Show resolved Hide resolved
docs/index.md Outdated Show resolved Hide resolved
docs/index.md Outdated

```{jcache-cli} jupyter_cache.cli.commands.cmd_main:jcache
:command: execute
:args: --executor local-serial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the first time people have seen the execute command, I'd recommend leaving out any extra arguments like --executor until you can explain what they mean in a subsequent step.

docs/index.md Outdated

```{jcache-cli} jupyter_cache.cli.commands.cmd_project:cmnd_project
:command: merge
:args: 1 _executed_notebook.ipynb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear where this _executed_notebook.ipynb file came from. Did you create it somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and improved wording

You can diff any of the cached notebooks with any (external) notebook:

```{jcache-cli} jupyter_cache.cli.commands.cmd_cache:cmnd_cache
:command: diff-nb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that all of the things in the cache are notebooks, why not just call it diff instead of diff-nb?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

"source": [
"(use/api)=\n",
"\n",
"# Python API"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta question: could we make api.ipynb a MyST-NB notebook? That way it would be much easier to review and diff

},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(use/api/cache)=\n",
"\n",
"## Cacheing Notebooks"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Cacheing Notebooks should come after the staging/execution examples, since that would mirror the same structure that the Command-Line page uses. And since staging/execution is more common than Cacheing, presumably?

},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notebooks can be staged, by adding the path as a stage record.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be renamed something like ## Add notebooks to a project for execution?

And in general this section uses "staging", "the staged notebook" etc, rather than "project" terminology, which is a bit confusing as I'm not sure how "staging" and "project" relate to one another

},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"source": [
"cache.merge_match_into_file(\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this update the cache itself, or simply return a notebook that is merged? We should make this clear via a note or something

@chrisjsewell
Copy link
Member Author

Just wondering what your thoughts are on lecture dependency?

It could be possible, but certainly not in this PR

@chrisjsewell chrisjsewell marked this pull request as ready for review January 13, 2022 07:21
@chrisjsewell
Copy link
Member Author

Ok @choldgraf and @mmcky, all changes applied from our discussion, so good to go:

$ jcache
Usage: jcache [OPTIONS] COMMAND [ARGS]...

  The command line interface of jupyter-cache.

Options:
  -v, --version       Show the version and exit.
  -p, --print-path    Print the current cache path and exit.
  -a, --autocomplete  Print the autocompletion command and exit.
  -h, --help          Show this message and exit.

Commands:
  cache     Work with cached execution(s) in a project.
  notebook  Work with notebook(s) in a project.
  project   Work with a project.

$ jcache project
Usage: jcache project [OPTIONS] COMMAND [ARGS]...

  Work with a project.

Options:
  -p, --cache-path TEXT  Path to project cache.  [default: (.jupyter_cache)]
  -h, --help             Show this message and exit.

Commands:
  cache-limit  Get/set maximum number of notebooks stored in the cache.
  clear        Clear the project cache completely.
  execute      Execute all outdated notebooks in the project.
  version      Print the version of the cache.

$ jcache notebook
Usage: jcache notebook [OPTIONS] COMMAND [ARGS]...

  Work with notebook(s) in a project.

Options:
  -p, --cache-path TEXT  Path to project cache.  [default: (.jupyter_cache)]
  -h, --help             Show this message and exit.

Commands:
  add              Add notebook(s) to the project.
  add-with-assets  Add notebook(s) to the project, with possible asset...
  clear            Remove all notebooks from the project.
  execute          Execute specific notebooks in the project.
  info             Show details of a notebook (by ID).
  invalidate       Remove any matching cache of the notebook(s) (by ID/URI).
  list             List notebooks in the project.
  merge            Create notebook merged with cached outputs (by ID/URI).
  remove           Remove notebook(s) from the project (by ID/URI).

$ jcache cache
Usage: jcache cache [OPTIONS] COMMAND [ARGS]...

  Work with cached execution(s) in a project.

Options:
  -p, --cache-path TEXT  Path to project cache.  [default: (.jupyter_cache)]
  -h, --help             Show this message and exit.

Commands:
  add                 Cache notebook(s) that have already been executed.
  add-with-artefacts  Cache a notebook, with possible artefact files.
  cat-artefact        Print the contents of a cached artefact.
  clear               Remove all executed notebooks from the cache.
  diff                Print a diff of a notebook to one stored in the cache.
  info                Show details of a cached notebook.
  list                List cached notebook records.
  remove              Remove notebooks stored in the cache.

@chrisjsewell
Copy link
Member Author

Ok, I will take the silence as implicit consent 😅 and merge.
This will be in a 0.5 release, so obviously won't impact myst-nb/jupyter-book straight away

@chrisjsewell chrisjsewell changed the title 🔀 MERGE: Refactor package ♻️ REFACTOR: package API/CLI/documentation Jan 25, 2022
@chrisjsewell chrisjsewell merged commit 065dcaf into master Jan 25, 2022
@chrisjsewell chrisjsewell deleted the improve-exec branch January 25, 2022 11:12
@mmcky
Copy link
Member

mmcky commented Jan 26, 2022

thanks @chrisjsewell -- I look forward to the new cli. 👍

@choldgraf
Copy link
Member

Cc @jjalaire - who is using this inside of quarto (I believe?). This is changing up the API a little bit so you might want to pin versions and double check your usage to make sure it still works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

4 participants