Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Jupyter-cache integration #55

Merged
merged 40 commits into from
Mar 28, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
536950c
added functionality of jupyter-cache
AakashGfude Mar 9, 2020
6fb9daf
documenting the execution functions
AakashGfude Mar 9, 2020
08a9a63
handling errored and excepted notebooks
AakashGfude Mar 9, 2020
10115b0
updating cacheing code
choldgraf Mar 9, 2020
35e36b0
Merge pull request #1 from ExecutableBookProject/holdgraf_updates
AakashGfude Mar 10, 2020
6d2fb27
first draft to run execution during env-get-outdated event
AakashGfude Mar 12, 2020
63a3237
keeping pk_list None by default to handle empty case in executor
AakashGfude Mar 12, 2020
2e6b3e1
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude Mar 12, 2020
04b6e1f
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude Mar 15, 2020
7f2d6ca
chris S's review commit
AakashGfude Mar 15, 2020
d250cd4
adding additional checks and refactoring
AakashGfude Mar 16, 2020
8a9e420
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude Mar 16, 2020
c6a9d13
import execute missing
AakashGfude Mar 16, 2020
ac27abf
added a variable for execution
AakashGfude Mar 17, 2020
f5de8d2
adding test for notebooks and refactoring code to adapt to that
AakashGfude Mar 18, 2020
95eb0a2
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude Mar 18, 2020
7756f10
testing basic execution, and reverting jupyter-cache path commit
AakashGfude Mar 19, 2020
bcf63f9
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude Mar 21, 2020
03c4199
writing tests to cover the all/most cases
AakashGfude Mar 21, 2020
2b2f8b8
added test for complex outputs
AakashGfude Mar 21, 2020
680b31b
refactoring of code and testing different scenarios
AakashGfude Mar 23, 2020
b8ae356
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude Mar 23, 2020
85e2a43
testing dependecies
AakashGfude Mar 23, 2020
7712fc3
removed non-deterministic outputs
AakashGfude Mar 24, 2020
9462f99
transferring cache folder to outdir
AakashGfude Mar 24, 2020
577134f
emptying non-deterministic outputs
AakashGfude Mar 24, 2020
1a8c187
adding execution and cacheing documentation
choldgraf Mar 24, 2020
ed2f010
edits to code - suggestions by Chris.H
AakashGfude Mar 25, 2020
3879986
Update myst_nb/cache.py
AakashGfude Mar 25, 2020
0cd4602
adding force_run to pass html-strict
AakashGfude Mar 25, 2020
93bef65
Merge branch 'jupyter-cahe-integration' of https://github.com/AakashG…
AakashGfude Mar 25, 2020
2c03f78
Merge branch 'master' of https://github.com/ExecutableBookProject/MyS…
AakashGfude Mar 25, 2020
f39634b
the pre-commit check was missing
AakashGfude Mar 25, 2020
0422cec
jupyter_execute_notebooks change and path for jupyter_cache allowed
AakashGfude Mar 26, 2020
3544ba9
added a test for jupyter_cache path
AakashGfude Mar 26, 2020
5cb5791
refactored code and tests to make nbclient more primary, and adjust c…
AakashGfude Mar 27, 2020
ff11ec1
updating documentation
AakashGfude Mar 27, 2020
7f588c6
small docs improvements and clearing notebook outputs
choldgraf Mar 27, 2020
db6ffff
moved fixtures to conftest
AakashGfude Mar 27, 2020
5538173
Merge branch 'jupyter-cahe-integration' of https://github.com/AakashG…
AakashGfude Mar 27, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ coverage.xml
*.py,cover
.hypothesis/
.pytest_cache/
(PosixPath('/*

# Translations
*.mo
Expand Down Expand Up @@ -127,3 +128,6 @@ dmypy.json

# Pyre type checker
.pyre/

# Jupyter Cache
docs/.jupyter_cache
3 changes: 3 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,7 @@
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
jupyter_sphinx_require_url = ""

copybutton_selector = "div:not(.output) > div.highlight pre"

jupyter_execute_notebooks = "cache"
162 changes: 20 additions & 142 deletions docs/use/basic.ipynb

Large diffs are not rendered by default.

90 changes: 90 additions & 0 deletions docs/use/execute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Executing and cacheing your content

MyST-NB can automatically run and cache any notebook pages. Notebooks can either
be run each time the documentation is build, or cached locally so that notebooks
will only be re-run when the code cells in a notebook have changed.

Cacheing behavior is controlled with configuration in your `conf.py` file. See
the sections below for each configuration option and its effect.

## Triggering notebook execution

To trigger the execution of notebook pages, use the following configuration in `conf.py`

```
jupyter_execute_notebooks = "auto"
```

By default, this will only execute notebooks that are missing at least one output. If
the notebook has *all* of its outputs populated, then it will not be executed.

**To force the execution of all notebooks, regardless of their outputs**, change the
above configuration value to:

```
jupyter_execute_notebooks = "force"
```

**To cache execution outputs with [jupyter-cache]**, change the above configuration
value to:

```
jupyter_execute_notebooks = "cache"
```

See {ref}`execute/cache` for more information.

**To turn off notebook execution**,change the
above configuration value to:

```
jupyter_execute_notebooks = "off"
```

**To exclude certain file patterns from execution**, use the following
configuration:

```
execution_excludepatterns = ['list', 'of', '*patterns']
```

Any file that matches one of the items in `execution_excludepatterns` will not be
executed.

(execute/cache)=
## Cacheing the notebook execution

You may also **cache the results of executing a notebook page** using [jupyter-cache]. In
this case, when a page is executed, its outputs will be stored in a local database.
This allows you to be sure that the outputs in your documentation are up-to-date,
while saving time avoiding unnecessary re-execution. It also allows you to store your
`.ipynb` files in your `git` repository *without their outputs*, but still leverage
a cache to save time when building your site.

When you re-build your site, the following will happen:

* Notebooks that have not seen changes to their **code cells** since the last build
will not be re-executed. Instead, their outputs will be pulled from the cache
and inserted into your site.
* Notebooks that **have any change to their code cells** will be re-executed, and the
cache will be updated with the new outputs.

To enable cacheing of notebook outputs, use the following configuration:

```
jupyter_execute_notebooks = "cache"
```

By default, the cache will be placed in the parent of your build folder. Generally,
this is in `_build/.jupyter_cache`.

You may also specify a path to the location of a jupyter cache you'd like to use:

```
jupyter_cache = path/to/mycache
```

The path should point to an **empty folder**, or a folder where a
**jupyter cache already exists**.

[jupyter-cache]: https://github.com/executablebookproject/jupyter-cache "the Jupyter Cache Project"
231 changes: 6 additions & 225 deletions docs/use/glue.ipynb

Large diffs are not rendered by default.

164 changes: 13 additions & 151 deletions docs/use/hiding.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/use/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ in Sphinx with MyST-NB.
basic
interactive
hiding
execute
markdown
glue
```
2,150 changes: 23 additions & 2,127 deletions docs/use/interactive.ipynb

Large diffs are not rendered by default.

17 changes: 1 addition & 16 deletions docs/use/orphaned_nb.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,7 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'My orphaned variable!'"
]
},
"metadata": {
"scrapbook": {
"mime_prefix": "",
"name": "orphaned_var"
}
},
"output_type": "display_data"
}
],
"outputs": [],
"source": [
"from myst_nb import glue\n",
"my_var = \"My orphaned variable!\"\n",
Expand Down
7 changes: 7 additions & 0 deletions myst_nb/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
__version__ = "0.4.0"

from docutils import nodes
from myst_nb.cache import execution_cache
from jupyter_sphinx.ast import ( # noqa: F401
JupyterWidgetStateNode,
JupyterWidgetViewNode,
Expand Down Expand Up @@ -87,11 +88,17 @@ def visit_element_html(self, node):
man=(skip, None),
)

# Add configuration for the cache
app.add_config_value("jupyter_cache", "", "env")
app.add_config_value("execution_excludepatterns", [], "env")
app.add_config_value("jupyter_execute_notebooks", "auto", "env")

# Register our post-transform which will convert output bundles to nodes
app.add_post_transform(PasteNodesToDocutils)
app.add_post_transform(CellOutputsToNodes)

app.connect("builder-inited", static_path)
app.connect("env-get-outdated", execution_cache)
app.connect("config-inited", update_togglebutton_classes)
app.connect("env-updated", save_glue_cache)
app.add_css_file("mystnb.css")
Expand Down
219 changes: 219 additions & 0 deletions myst_nb/cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
"""
Implements integration of jupyter-cache
"""
import os
import nbformat as nbf
from nbclient import execute
from pathlib import Path

from sphinx.util import logging
from sphinx.util.osutil import ensuredir

from jupyter_cache.cache.db import NbCacheRecord
from jupyter_cache import get_cache
from jupyter_cache.executors import load_executor

logger = logging.getLogger(__name__)

filtered_nb_list = set()


def execution_cache(app, env, added, changed, removed, path_cache=None):
"""
If cacheing is required, stages and executes the added or modified notebooks,
and caches them for further use.
"""
jupyter_cache = False
exclude_files = []
nb_list = added.union(
changed
) # all the added and changed notebooks should be operated on.

if env.config["jupyter_execute_notebooks"] not in ["force", "auto", "cache", "off"]:
logger.error(
"Conf jupyter_execute_notebooks can either be `force`, `auto`, `cache` or `off`" # noqa: E501
)
exit(1)

jupyter_cache = env.config["jupyter_cache"]

# excludes the file with patterns given in execution_excludepatterns
# conf variable from executing, like index.rst
for path in env.config["execution_excludepatterns"]:
exclude_files.extend(Path().cwd().rglob(path))

for nb in nb_list:
exclude = False
for files in exclude_files:
if nb in str(files):
exclude = True
if not exclude:
filtered_nb_list.add(nb)

if "cache" in env.config["jupyter_execute_notebooks"]:
if jupyter_cache:
if os.path.isdir(jupyter_cache):
path_cache = jupyter_cache
else:
logger.error("Path to jupyter_cache is not a directory")
exit(1)
else:
path_cache = path_cache or Path(env.outdir).parent.joinpath(
".jupyter_cache"
)

app.env.path_cache = str(
path_cache
) # TODO: is there a better way to make it accessible?

_stage_and_execute(env, filtered_nb_list, path_cache)

elif jupyter_cache:
logger.error(
"If using conf jupyter_cache, please set jupyter_execute_notebooks" # noqa: E501
" to `cache`"
)
exit(1)

return nb_list # TODO: can also compare timestamps for inputs outputs


def _stage_and_execute(env, nb_list, path_cache):
pk_list = None

try:
from jupyter_cache.cache.main import JupyterCacheBase # noqa: F401
except ImportError:
logger.error(
"Using caching requires that jupyter_cache is installed." # noqa: E501
)

cache_base = get_cache(path_cache)

for nb in nb_list:
if "." in nb: # nb includes the path to notebook
source_path = nb
else:
source_path = env.env.doc2path(nb)

# prevents execution of other formats like .md
if ".ipynb" not in source_path:
continue

if pk_list is None:
pk_list = []
stage_record = cache_base.stage_notebook_file(source_path)
pk_list.append(stage_record.pk)

execute_staged_nb(
cache_base, pk_list
) # can leverage parallel execution implemented in jupyter-cache here


def add_notebook_outputs(env, ntbk, file_path=None):
"""
Add outputs to a NotebookNode by pulling from cache.
Function to get the database instance. Get the cached output of the notebook
and merge it with the original notebook. If there is no cached output,
checks if there was error during execution, then saves the traceback to a log file.
"""
# If we have a jupyter_cache, see if there's a cache for this notebook
file_path = file_path or env.doc2path(env.docname)
dest_path = Path(env.app.outdir)
reports_dir = str(dest_path) + "/reports"
path_cache = False

# checking if filename in execute_excludepattern
file_present = [env.docname in nb for nb in filtered_nb_list]
if True not in file_present:
return ntbk

if "cache" in env.config["jupyter_execute_notebooks"]:
path_cache = env.path_cache

if not path_cache:
if "off" not in env.config["jupyter_execute_notebooks"]:
has_outputs = _read_nb_output_cells(
file_path, env.config["jupyter_execute_notebooks"]
)
if not has_outputs:
logger.info("Executing: {}".format(env.docname))
ntbk = execute(ntbk)
else:
logger.info(
"Did not execute {}. "
"Set jupyter_execute_notebooks to `force` to execute".format(
env.docname
)
)
return ntbk

cache_base = get_cache(path_cache)
db = cache_base.db
cache_record = None
r_file_path = Path(file_path).relative_to(Path(file_path).cwd())

try:
cache_list = NbCacheRecord.records_from_uri(file_path, db)
if len(cache_list):
latest = None
for item in cache_list:
if latest is None or (latest < item.created):
latest = item.created
latest_record = item
cache_record = latest_record
except KeyError:
cache_record = None
logger.error(
(
f"Couldn't find cache key for notebook file {str(r_file_path)}. "
"Outputs will not be inserted"
)
)

if cache_record:
_, ntbk = cache_base.merge_match_into_notebook(ntbk)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could just be

if not path_cache:
    # If we explicitly did not wish to cache, then just execute the notebook
    ntbk = execute(ntbk)
    return ntbk

at the beginning, and then you could de-dent all the execution logic

try:
stage_record = cache_base.get_staged_record(file_path)
except KeyError:
stage_record = None
if stage_record and stage_record.traceback:
# save the traceback to a log file
ensuredir(reports_dir)
file_name = os.path.splitext(r_file_path.name)[0]
full_path = reports_dir + "/{}.log".format(file_name)
with open(full_path, "w") as log_file:
log_file.write(stage_record.traceback)
logger.info(
"Execution traceback for {} is saved in {}".format(file_name, full_path)
)

return ntbk


def execute_staged_nb(cache_base, pk_list):
"""
executing the staged notebook
"""
try:
executor = load_executor("basic", cache_base, logger=logger)
except ImportError as error:
logger.error(str(error))
return 1
result = executor.run_and_cache(filter_pks=pk_list or None)
return result


def _read_nb_output_cells(source_path, jupyter_execute_notebooks):
has_outputs = False
if jupyter_execute_notebooks and jupyter_execute_notebooks == "auto":
with open(source_path, "r") as f:
ntbk = nbf.read(f, as_version=4)
has_outputs = all(
len(cell.outputs) != 0
for cell in ntbk.cells
if cell["cell_type"] == "code"
)
return has_outputs
Loading