-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
♻️ REFACTOR: package API/CLI/documentation #74
Conversation
Rather than passing an optional `converter` to methods, we now store staged files with a specific reader key. The key relates to an entry-point (in group `jcache.readers`) of dynamically loaded reader. Also, the `jupyter_executors` entry group has been changed to `jcache.executors`, and `importlib-metadata` is used to load entry points.
Codecov Report
@@ Coverage Diff @@
## master #74 +/- ##
==========================================
+ Coverage 81.33% 82.85% +1.51%
==========================================
Files 17 20 +3
Lines 1045 1318 +273
==========================================
+ Hits 850 1092 +242
- Misses 195 226 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
It was felt that this is conceptually easier to understand, i.e. it is a list of records for each notebook (& associated data) in the project, rather than just a staging area for pre-executed notebooks.
For `jcache project` commands, and remove `--all` option, in favour of separate `jcache project clear`
To write out a notebook merging the project file with its cached outputs.
The execution logic was also refactored, to reduce code duplication. Note, artefact retrieval has been removed for now, until the logic can be improved.
@choldgraf there is some more I probably want to do here, but it would be good if you could have a skim of the new documentation and give some feedback ta |
Cool, I'll assign myself so I remember |
Some notes on possible TODOs
docs:
myst-nb / jupyter-book integration
other:
|
Related to this, I also just opened jupyter/nbclient#151 |
@chrisjsewell these improvements look great. I particularly like the new terminology with One question I had in relation to the I don't think is a high priority issue but it is a nice to have feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to this, I also just opened jupyter/nbclient#151
At a glance, that approach seems to fail with the way hash keys are implemented now. Imagine the user modifies a cell with skip-execution tag applied. This should definitely result in the same key. Now, however, all code cells are used to compute the hash key.
Also parallel execution is amazing! I'd love to cut down on those 1 hour build times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice re-work! I focused on the documentation in this review - in general I think it's good and does a nice job of explaining the end-to-end functionality of the Python API and the CLI. My main questions and comments were around nomenclature and making sure that some of the ideas are explained in a clear way. I tried to note where I was a bit confused, as presumably this is where others will be confused as well! Happy to take another look if you make some changes!
shutil.rmtree(path) | ||
return [] | ||
|
||
class JcacheCli(SphinxDirective): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide comments for what these classes do so that others have extra context? I guess it's a developer-friendly tool for these docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added docstring
docs/index.md
Outdated
|
||
```{jcache-cli} jupyter_cache.cli.commands.cmd_main:jcache | ||
:command: execute | ||
:args: --executor local-serial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the first time people have seen the execute
command, I'd recommend leaving out any extra arguments like --executor
until you can explain what they mean in a subsequent step.
docs/index.md
Outdated
|
||
```{jcache-cli} jupyter_cache.cli.commands.cmd_project:cmnd_project | ||
:command: merge | ||
:args: 1 _executed_notebook.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear where this _executed_notebook.ipynb
file came from. Did you create it somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and improved wording
docs/using/cli.md
Outdated
You can diff any of the cached notebooks with any (external) notebook: | ||
|
||
```{jcache-cli} jupyter_cache.cli.commands.cmd_cache:cmnd_cache | ||
:command: diff-nb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that all of the things in the cache are notebooks, why not just call it diff
instead of diff-nb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
docs/using/api.ipynb
Outdated
"source": [ | ||
"(use/api)=\n", | ||
"\n", | ||
"# Python API" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta question: could we make api.ipynb
a MyST-NB notebook? That way it would be much easier to review and diff
docs/using/api.ipynb
Outdated
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"(use/api/cache)=\n", | ||
"\n", | ||
"## Cacheing Notebooks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Cacheing Notebooks should come after the staging/execution examples, since that would mirror the same structure that the Command-Line
page uses. And since staging/execution is more common than Cacheing, presumably?
docs/using/api.ipynb
Outdated
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Notebooks can be staged, by adding the path as a stage record.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be renamed something like ## Add notebooks to a project for execution
?
And in general this section uses "staging", "the staged notebook" etc, rather than "project" terminology, which is a bit confusing as I'm not sure how "staging" and "project" relate to one another
docs/using/api.ipynb
Outdated
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 26, | ||
"metadata": {}, | ||
"source": [ | ||
"cache.merge_match_into_file(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this update the cache itself, or simply return a notebook that is merged? We should make this clear via a note
or something
click v8 has improved completion handling
So that we can make it more flexible, and allow for the myst-nb custom formats (with kwargs)
…to caches, move `--cache-path` to sub-levels
It could be possible, but certainly not in this PR |
Ok @choldgraf and @mmcky, all changes applied from our discussion, so good to go: $ jcache
Usage: jcache [OPTIONS] COMMAND [ARGS]...
The command line interface of jupyter-cache.
Options:
-v, --version Show the version and exit.
-p, --print-path Print the current cache path and exit.
-a, --autocomplete Print the autocompletion command and exit.
-h, --help Show this message and exit.
Commands:
cache Work with cached execution(s) in a project.
notebook Work with notebook(s) in a project.
project Work with a project.
$ jcache project
Usage: jcache project [OPTIONS] COMMAND [ARGS]...
Work with a project.
Options:
-p, --cache-path TEXT Path to project cache. [default: (.jupyter_cache)]
-h, --help Show this message and exit.
Commands:
cache-limit Get/set maximum number of notebooks stored in the cache.
clear Clear the project cache completely.
execute Execute all outdated notebooks in the project.
version Print the version of the cache.
$ jcache notebook
Usage: jcache notebook [OPTIONS] COMMAND [ARGS]...
Work with notebook(s) in a project.
Options:
-p, --cache-path TEXT Path to project cache. [default: (.jupyter_cache)]
-h, --help Show this message and exit.
Commands:
add Add notebook(s) to the project.
add-with-assets Add notebook(s) to the project, with possible asset...
clear Remove all notebooks from the project.
execute Execute specific notebooks in the project.
info Show details of a notebook (by ID).
invalidate Remove any matching cache of the notebook(s) (by ID/URI).
list List notebooks in the project.
merge Create notebook merged with cached outputs (by ID/URI).
remove Remove notebook(s) from the project (by ID/URI).
$ jcache cache
Usage: jcache cache [OPTIONS] COMMAND [ARGS]...
Work with cached execution(s) in a project.
Options:
-p, --cache-path TEXT Path to project cache. [default: (.jupyter_cache)]
-h, --help Show this message and exit.
Commands:
add Cache notebook(s) that have already been executed.
add-with-artefacts Cache a notebook, with possible artefact files.
cat-artefact Print the contents of a cached artefact.
clear Remove all executed notebooks from the cache.
diff Print a diff of a notebook to one stored in the cache.
info Show details of a cached notebook.
list List cached notebook records.
remove Remove notebooks stored in the cache. |
Ok, I will take the silence as implicit consent 😅 and merge. |
thanks @chrisjsewell -- I look forward to the new |
Cc @jjalaire - who is using this inside of quarto (I believe?). This is changing up the API a little bit so you might want to pin versions and double check your usage to make sure it still works! |
This PR re-writes key parts of the package (a) to add additional functionality, and (b) with a view to eventually exposing this CLI in https://jupyterbook.org/.
Key changes:
stage
/staging
is now rephrased tonotebook
, plus the addition ofproject
, i.e. you add notebooks to a project, then execute themread_data
is specified per notebook in the project, allowing for multiple types of file to be read/executed via the CLI (e.g. MyST Markdown files via jupytext). Before, the read functions were passed directly to the API methods.jbcache execute --executor
, and a parallel notebook executor has been added.jbcache project list
and othe CLI improvements