Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up notebooks build script #108

Merged
merged 18 commits into from
Nov 23, 2021
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 0 additions & 24 deletions examples/Makefile

This file was deleted.

14 changes: 4 additions & 10 deletions examples/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ The procedure for adding a adding a new notebook to the documentation is:

2. Run ``makeindex.py`` to update the example scripts README file, the notebook index file, and the examples index in the docs.

3. Run ``makejnb.py`` to build the new notebook, as well as any other notebooks that are out of date with respect to their source scripts, as determined by the respective file timestamps.
3. Run ``makenotebooks.py`` to build the new notebook, as well as any other notebooks that are out of date with respect to their source scripts, as determined by the respective file timestamps.

4. Add and commit the new script, the ``scripts/index.rst`` script index file, the auto-generated ``scripts/README.rst`` file and ``docs/source/examples.rst`` index file, and the new or updated notebooks and the auto-generated ``notebooks/index.ipynb`` file in the notebooks directory (following the submodule handling procedure as described in the developer docs).


The procedure for rebuilding notebook(s) after the source file(s) have been modified is:

1. Run ``makejnb.py`` to build the new notebook, as well as any other notebooks that are out of date with respect to their source scripts, as determined by the respective file timestamps. Note that timestamps for files retrieved from version control may not be meaningful for this purpose. In such cases, ``touch`` the relevant source scripts to force updating on the next run of ``makejnb.py``.
1. Run ``makenotebooks.py`` to build the new notebook, as well as any other notebooks that are out of date with respect to their source scripts, as determined by the respective file timestamps. Note that timestamps for files retrieved from version control may not be meaningful for this purpose. In such cases, ``touch`` the relevant source scripts to force updating on the next run of ``makenotebooks.py``.

2. Add and commit the modified script(s), and the updated notebooks (following the submodule handling procedure as described in the developer docs).

Expand All @@ -40,17 +40,11 @@ A number of files in this directory assist in the mangement of the usage example
`notebooks_requirements.txt <examples_requirements.txt>`_
Requirements file (as used by ``pip``) listing additional dependencies for building the Jupyter notebooks from the usage example scripts.

`makejnb.py <makejnb.py>`_
An alternative to the makefile for updating the auto-generated Jupyter notebooks. Notebooks are executed in parallel using the ``ray`` package.
`makenotebooks.py <makenotebooks.py>`_
Auto-generate Jupyter notebooks from the example scripts.

`makeindex.py <makeindex.py>`_
Auto-generate the docs example index ``docs/source/examples.rst`` from the example scripts index ``scripts/index.rst``.

`Makefile <Makefile>`_
A makefile allowing use of the command ``make`` to update auto-generated Jupyter notebooks. Run as ``make no-execute=true`` to update the notebooks without executing them. Use of `makejnb.py` rather than this makefile is recommended.

`pytojnb.sh <pytojnb.sh>`_
Low-level python to Jupyter notebook conversion script. Used by both the makefile and `makejnb.py <makejnb.py>`_.

`scriptcheck.sh <scriptcheck.sh>`_
Michael-T-McCann marked this conversation as resolved.
Show resolved Hide resolved
Run all example scripts with a reduced number of iterations as a rapid check that they are functioning correctly.
87 changes: 0 additions & 87 deletions examples/makejnb.py

This file was deleted.

202 changes: 202 additions & 0 deletions examples/makenotebooks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
#!/usr/bin/env python

# Extract a list of Python scripts from "scripts/index.rst" and
# create/update and execute any Jupyter notebooks that are out
# of date with respect to their source Python scripts. If script
# names specified on command line, process them instead.
# Run
# python makenotebooks.py -h
# for usage details.

import argparse
import os
import re
import sys
from pathlib import Path
from timeit import default_timer as timer

import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
from py2jn.tools import py_string_to_notebook, write_notebook

have_ray = True
try:
import ray
except ImportError:
have_ray = False


def py_file_to_string(src):
"""Preprocess example script file and return result as a string."""

with open(src, "r") as srcfile:
# Drop header comment
for line in srcfile:
if line[0] != "#":
break # assume first non-comment line is a newline that can be dropped
# Insert notebook plot config after last import
lines = []
import_seen = False
for line in srcfile:
line = re.sub('^r"""', '"""', line) # remove r from r"""
line = re.sub(":cite:`([^`]+)`", r'<cite data-cite="\1"/>', line) # fix cite format
if import_seen:
# Once an import statement has been seen, break on encountering a line that
# is neither an import statement nor a newline, nor a component of an import
# statement extended over multiple lines, nor components of a try/except
# construction (note that handling of these final two cases is probably not
# very robust).
if not re.match(r"(^import|^from|^\n$|^\W+[^\W]|^\)$|^try:$|^except)", line):
lines.append(line)
break
else:
# Set flag indicating that an import statement has been seen once one has
# been encountered
if re.match("^(import|from)", line):
import_seen = True
lines.append(line)
# Backtrack through list of lines to find last import statement
n = 1
for line in lines[-2::-1]:
if re.match("^(import|from)", line):
break
else:
n += 1
# Insert notebook plotting config directly after last import statement
lines.insert(-n, "plot.config_notebook_plotting()\n")

# Process remainder of source file
for line in srcfile:
if re.match("^input", line): # end processing when input statement encountered
break
line = re.sub('^r"""', '"""', line) # remove r from r"""
line = re.sub(":cite:\`([^`]+)\`", r'<cite data-cite="\1"/>', line) # fix cite format
lines.append(line)

# Backtrack through list of lines to remove trailing newlines
n = 0
for line in lines[::-1]:
if re.match("^\n$", line):
n += 1
else:
break
lines = lines[0:-n]

return "".join(lines)


def script_to_notebook(src, dst):
"""Convert a Python example script into a Jupyter notebook."""

str = py_file_to_string(src)
nb = py_string_to_notebook(str)
write_notebook(nb, dst)


def execute_notebook(fname):
"""Execute the specified notebook file."""

with open(fname) as f:
nb = nbformat.read(f, as_version=4)
ep = ExecutePreprocessor()
try:
t0 = timer()
out = ep.preprocess(nb)
t1 = timer()
with open(fname, "w", encoding="utf-8") as f:
nbformat.write(nb, f)
except CellExecutionError:
raise Exception(f"Error executing the notebook {fname}")
print(f"{fname} done in {(t1 - t0):.1e} s")


argparser = argparse.ArgumentParser(
description="Convert Python example scripts to Jupyter notebooks."
)
argparser.add_argument(
"--all",
action="store_true",
help="Process all notebooks, without checking timestamps. "
"Has no effect when files to process are explicitly specified.",
)
argparser.add_argument(
"--no-exec", action="store_true", help="Create/update notebooks but don't execute them"
)
argparser.add_argument(
"--no-ray",
action="store_true",
help="Execute notebooks serially, without the use of ray parallelization",
)
argparser.add_argument("filename", nargs="*", help="Optional Python example script filenames")
args = argparser.parse_args()

# Raise error if ray needed but not present
if not have_ray and not args.no_ray:
raise RuntimeError("The ray package is required to run this script")
Michael-T-McCann marked this conversation as resolved.
Show resolved Hide resolved
Michael-T-McCann marked this conversation as resolved.
Show resolved Hide resolved


if args.filename:
# Script names specified on command line
scriptnames = [os.path.basename(s) for s in args.filename]
else:
# Read script names from index file
scriptnames = []
srcidx = "scripts/index.rst"
with open(srcidx, "r") as idxfile:
for line in idxfile:
m = re.match(r"(\s+)- ([^\s]+.py)", line)
if m:
scriptnames.append(m.group(2))

# Ensure list entries are unique
scriptnames = list(set(scriptnames))

# Construct script paths
scripts = [Path("scripts") / Path(s) for s in scriptnames]

# Display status information
print(f"Processing scripts {', '.join(scriptnames)}")
Michael-T-McCann marked this conversation as resolved.
Show resolved Hide resolved

# Convert scripts to corresponding notebooks and create list of new/modified notebooks.
notebooks = []
for s in scripts:
nb = Path("notebooks") / (s.stem + ".ipynb")
# If scripts specified on command line or --all flag specified, convert all scripts.
# Otherwise, only convert scripts that have a newer timestamp than their corresponding
# notebooks, or that have not previously been converted (i.e. corresponding notebook
# file does not exist).
if args.all or args.filename or not nb.is_file() or s.stat().st_mtime > nb.stat().st_mtime:
# Make notebook file
script_to_notebook(s, nb)
# Add it to the list for execution
notebooks.append(nb)


# Run relevant notebooks if no excecution flag not specified
if not args.no_exec:

# Execute notebooks serially if not requested to avoid use of ray
if args.no_ray:

for nbfile in notebooks:
execute_notebook(nbfile)

# Execute notebooks in parallel using ray
else:
ray.init()

nproc = len(notebooks)
ngpu = 0
ar = ray.available_resources()
ncpu = max(int(ar["CPU"]) // nproc, 1)
if "GPU" in ar:
ngpu = max(int(ar["GPU"]) // nproc, 1)
print(f"Running on {ncpu} CPUs and {ngpu} GPUs per process")

# Function to execute each notebook with available resources suitably divided
@ray.remote(num_cpus=ncpu, num_gpus=ngpu)
def ray_run_nb(fname):
execute_notebook(fname)

# Execute relevant notebooks in parallel
ray.get([ray_run_nb.remote(nbfile) for nbfile in notebooks])
30 changes: 0 additions & 30 deletions examples/pytojnb.sh

This file was deleted.