-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Jupyter-cache integration #55
Merged
choldgraf
merged 40 commits into
executablebooks:master
from
AakashGfude:jupyter-cahe-integration
Mar 28, 2020
Merged
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
536950c
added functionality of jupyter-cache
AakashGfude 6fb9daf
documenting the execution functions
AakashGfude 08a9a63
handling errored and excepted notebooks
AakashGfude 10115b0
updating cacheing code
choldgraf 35e36b0
Merge pull request #1 from ExecutableBookProject/holdgraf_updates
AakashGfude 6d2fb27
first draft to run execution during env-get-outdated event
AakashGfude 63a3237
keeping pk_list None by default to handle empty case in executor
AakashGfude 2e6b3e1
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude 04b6e1f
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude 7f2d6ca
chris S's review commit
AakashGfude d250cd4
adding additional checks and refactoring
AakashGfude 8a9e420
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude c6a9d13
import execute missing
AakashGfude ac27abf
added a variable for execution
AakashGfude f5de8d2
adding test for notebooks and refactoring code to adapt to that
AakashGfude 95eb0a2
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude 7756f10
testing basic execution, and reverting jupyter-cache path commit
AakashGfude bcf63f9
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude 03c4199
writing tests to cover the all/most cases
AakashGfude 2b2f8b8
added test for complex outputs
AakashGfude 680b31b
refactoring of code and testing different scenarios
AakashGfude b8ae356
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude 85e2a43
testing dependecies
AakashGfude 7712fc3
removed non-deterministic outputs
AakashGfude 9462f99
transferring cache folder to outdir
AakashGfude 577134f
emptying non-deterministic outputs
AakashGfude 1a8c187
adding execution and cacheing documentation
choldgraf ed2f010
edits to code - suggestions by Chris.H
AakashGfude 3879986
Update myst_nb/cache.py
AakashGfude 0cd4602
adding force_run to pass html-strict
AakashGfude 93bef65
Merge branch 'jupyter-cahe-integration' of https://github.com/AakashG…
AakashGfude 2c03f78
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude f39634b
the pre-commit check was missing
AakashGfude 0422cec
jupyter_execute_notebooks change and path for jupyter_cache allowed
AakashGfude 3544ba9
added a test for jupyter_cache path
AakashGfude 5cb5791
refactored code and tests to make nbclient more primary, and adjust c…
AakashGfude ff11ec1
updating documentation
AakashGfude 7f588c6
small docs improvements and clearing notebook outputs
choldgraf db6ffff
moved fixtures to conftest
AakashGfude 5538173
Merge branch 'jupyter-cahe-integration' of https://github.com/AakashG…
AakashGfude File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Executing and cacheing your content | ||
|
||
MyST-NB can automatically run and cache any notebook pages. Notebooks can either | ||
be run each time the documentation is build, or cached locally so that notebooks | ||
will only be re-run when the code cells in a notebook have changed. | ||
|
||
Cacheing behavior is controlled with configuration in your `conf.py` file. See | ||
the sections below for each configuration option and its effect. | ||
|
||
## Triggering notebook execution | ||
|
||
To trigger the execution of notebook pages, use the following configuration in `conf.py` | ||
|
||
``` | ||
jupyter_execute_notebooks = "auto" | ||
``` | ||
|
||
By default, this will only execute notebooks that are missing at least one output. If | ||
the notebook has *all* of its outputs populated, then it will not be executed. | ||
|
||
**To force the execution of all notebooks, regardless of their outputs**, change the | ||
above configuration value to: | ||
|
||
``` | ||
jupyter_execute_notebooks = "force" | ||
``` | ||
|
||
**To cache execution outputs with [jupyter-cache]**, change the above configuration | ||
value to: | ||
|
||
``` | ||
jupyter_execute_notebooks = "cache" | ||
``` | ||
|
||
See {ref}`execute/cache` for more information. | ||
|
||
**To turn off notebook execution**,change the | ||
above configuration value to: | ||
|
||
``` | ||
jupyter_execute_notebooks = "off" | ||
``` | ||
|
||
**To exclude certain file patterns from execution**, use the following | ||
configuration: | ||
|
||
``` | ||
execution_excludepatterns = ['list', 'of', '*patterns'] | ||
``` | ||
|
||
Any file that matches one of the items in `execution_excludepatterns` will not be | ||
executed. | ||
|
||
(execute/cache)= | ||
## Cacheing the notebook execution | ||
|
||
You may also **cache the results of executing a notebook page** using [jupyter-cache]. In | ||
this case, when a page is executed, its outputs will be stored in a local database. | ||
This allows you to be sure that the outputs in your documentation are up-to-date, | ||
while saving time avoiding unnecessary re-execution. It also allows you to store your | ||
`.ipynb` files in your `git` repository *without their outputs*, but still leverage | ||
a cache to save time when building your site. | ||
|
||
When you re-build your site, the following will happen: | ||
|
||
* Notebooks that have not seen changes to their **code cells** since the last build | ||
will not be re-executed. Instead, their outputs will be pulled from the cache | ||
and inserted into your site. | ||
* Notebooks that **have any change to their code cells** will be re-executed, and the | ||
cache will be updated with the new outputs. | ||
|
||
To enable cacheing of notebook outputs, use the following configuration: | ||
|
||
``` | ||
jupyter_execute_notebooks = "cache" | ||
``` | ||
|
||
By default, the cache will be placed in the parent of your build folder. Generally, | ||
this is in `_build/.jupyter_cache`. | ||
|
||
You may also specify a path to the location of a jupyter cache you'd like to use: | ||
|
||
``` | ||
jupyter_cache = path/to/mycache | ||
``` | ||
|
||
The path should point to an **empty folder**, or a folder where a | ||
**jupyter cache already exists**. | ||
|
||
[jupyter-cache]: https://github.com/executablebookproject/jupyter-cache "the Jupyter Cache Project" |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,7 @@ in Sphinx with MyST-NB. | |
basic | ||
interactive | ||
hiding | ||
execute | ||
markdown | ||
glue | ||
``` |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,219 @@ | ||
""" | ||
Implements integration of jupyter-cache | ||
""" | ||
import os | ||
import nbformat as nbf | ||
from nbclient import execute | ||
from pathlib import Path | ||
|
||
from sphinx.util import logging | ||
from sphinx.util.osutil import ensuredir | ||
|
||
from jupyter_cache.cache.db import NbCacheRecord | ||
from jupyter_cache import get_cache | ||
from jupyter_cache.executors import load_executor | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
filtered_nb_list = set() | ||
|
||
|
||
def execution_cache(app, env, added, changed, removed, path_cache=None): | ||
""" | ||
If cacheing is required, stages and executes the added or modified notebooks, | ||
and caches them for further use. | ||
""" | ||
jupyter_cache = False | ||
exclude_files = [] | ||
nb_list = added.union( | ||
changed | ||
) # all the added and changed notebooks should be operated on. | ||
|
||
if env.config["jupyter_execute_notebooks"] not in ["force", "auto", "cache", "off"]: | ||
logger.error( | ||
"Conf jupyter_execute_notebooks can either be `force`, `auto`, `cache` or `off`" # noqa: E501 | ||
) | ||
exit(1) | ||
|
||
jupyter_cache = env.config["jupyter_cache"] | ||
|
||
# excludes the file with patterns given in execution_excludepatterns | ||
# conf variable from executing, like index.rst | ||
for path in env.config["execution_excludepatterns"]: | ||
exclude_files.extend(Path().cwd().rglob(path)) | ||
|
||
for nb in nb_list: | ||
exclude = False | ||
for files in exclude_files: | ||
if nb in str(files): | ||
exclude = True | ||
if not exclude: | ||
filtered_nb_list.add(nb) | ||
|
||
if "cache" in env.config["jupyter_execute_notebooks"]: | ||
if jupyter_cache: | ||
if os.path.isdir(jupyter_cache): | ||
path_cache = jupyter_cache | ||
else: | ||
logger.error("Path to jupyter_cache is not a directory") | ||
exit(1) | ||
else: | ||
path_cache = path_cache or Path(env.outdir).parent.joinpath( | ||
".jupyter_cache" | ||
) | ||
|
||
app.env.path_cache = str( | ||
path_cache | ||
) # TODO: is there a better way to make it accessible? | ||
|
||
_stage_and_execute(env, filtered_nb_list, path_cache) | ||
|
||
elif jupyter_cache: | ||
logger.error( | ||
"If using conf jupyter_cache, please set jupyter_execute_notebooks" # noqa: E501 | ||
" to `cache`" | ||
) | ||
exit(1) | ||
|
||
return nb_list # TODO: can also compare timestamps for inputs outputs | ||
|
||
|
||
def _stage_and_execute(env, nb_list, path_cache): | ||
pk_list = None | ||
|
||
try: | ||
from jupyter_cache.cache.main import JupyterCacheBase # noqa: F401 | ||
except ImportError: | ||
logger.error( | ||
"Using caching requires that jupyter_cache is installed." # noqa: E501 | ||
) | ||
|
||
cache_base = get_cache(path_cache) | ||
|
||
for nb in nb_list: | ||
if "." in nb: # nb includes the path to notebook | ||
source_path = nb | ||
else: | ||
source_path = env.env.doc2path(nb) | ||
|
||
# prevents execution of other formats like .md | ||
if ".ipynb" not in source_path: | ||
continue | ||
|
||
if pk_list is None: | ||
pk_list = [] | ||
stage_record = cache_base.stage_notebook_file(source_path) | ||
pk_list.append(stage_record.pk) | ||
|
||
execute_staged_nb( | ||
cache_base, pk_list | ||
) # can leverage parallel execution implemented in jupyter-cache here | ||
|
||
|
||
def add_notebook_outputs(env, ntbk, file_path=None): | ||
""" | ||
Add outputs to a NotebookNode by pulling from cache. | ||
Function to get the database instance. Get the cached output of the notebook | ||
and merge it with the original notebook. If there is no cached output, | ||
checks if there was error during execution, then saves the traceback to a log file. | ||
""" | ||
# If we have a jupyter_cache, see if there's a cache for this notebook | ||
file_path = file_path or env.doc2path(env.docname) | ||
dest_path = Path(env.app.outdir) | ||
reports_dir = str(dest_path) + "/reports" | ||
path_cache = False | ||
|
||
# checking if filename in execute_excludepattern | ||
file_present = [env.docname in nb for nb in filtered_nb_list] | ||
if True not in file_present: | ||
return ntbk | ||
|
||
if "cache" in env.config["jupyter_execute_notebooks"]: | ||
path_cache = env.path_cache | ||
|
||
if not path_cache: | ||
if "off" not in env.config["jupyter_execute_notebooks"]: | ||
has_outputs = _read_nb_output_cells( | ||
file_path, env.config["jupyter_execute_notebooks"] | ||
) | ||
if not has_outputs: | ||
logger.info("Executing: {}".format(env.docname)) | ||
ntbk = execute(ntbk) | ||
else: | ||
logger.info( | ||
"Did not execute {}. " | ||
"Set jupyter_execute_notebooks to `force` to execute".format( | ||
env.docname | ||
) | ||
) | ||
return ntbk | ||
|
||
cache_base = get_cache(path_cache) | ||
db = cache_base.db | ||
cache_record = None | ||
r_file_path = Path(file_path).relative_to(Path(file_path).cwd()) | ||
|
||
try: | ||
cache_list = NbCacheRecord.records_from_uri(file_path, db) | ||
if len(cache_list): | ||
latest = None | ||
for item in cache_list: | ||
if latest is None or (latest < item.created): | ||
latest = item.created | ||
latest_record = item | ||
cache_record = latest_record | ||
except KeyError: | ||
cache_record = None | ||
logger.error( | ||
( | ||
f"Couldn't find cache key for notebook file {str(r_file_path)}. " | ||
"Outputs will not be inserted" | ||
) | ||
) | ||
|
||
if cache_record: | ||
_, ntbk = cache_base.merge_match_into_notebook(ntbk) | ||
else: | ||
try: | ||
stage_record = cache_base.get_staged_record(file_path) | ||
except KeyError: | ||
stage_record = None | ||
if stage_record and stage_record.traceback: | ||
# save the traceback to a log file | ||
ensuredir(reports_dir) | ||
file_name = os.path.splitext(r_file_path.name)[0] | ||
full_path = reports_dir + "/{}.log".format(file_name) | ||
with open(full_path, "w") as log_file: | ||
log_file.write(stage_record.traceback) | ||
logger.info( | ||
"Execution traceback for {} is saved in {}".format(file_name, full_path) | ||
) | ||
|
||
return ntbk | ||
|
||
|
||
def execute_staged_nb(cache_base, pk_list): | ||
""" | ||
executing the staged notebook | ||
""" | ||
try: | ||
executor = load_executor("basic", cache_base, logger=logger) | ||
except ImportError as error: | ||
logger.error(str(error)) | ||
return 1 | ||
result = executor.run_and_cache(filter_pks=pk_list or None) | ||
return result | ||
|
||
|
||
def _read_nb_output_cells(source_path, jupyter_execute_notebooks): | ||
has_outputs = False | ||
if jupyter_execute_notebooks and jupyter_execute_notebooks == "auto": | ||
with open(source_path, "r") as f: | ||
ntbk = nbf.read(f, as_version=4) | ||
has_outputs = all( | ||
len(cell.outputs) != 0 | ||
for cell in ntbk.cells | ||
if cell["cell_type"] == "code" | ||
) | ||
return has_outputs |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could just be
at the beginning, and then you could de-dent all the execution logic