- Replace internal Nublado Client with version from rubin-nublado-client.
-
Notebook execution jobs can now set timeouts. In requests, set a timeout in the
timeout
request field. This can be a number of seconds, or a human-readable duration string (e.g. "1h30m"). The specified timeout is also repeated in the response body. This timeout applies to the notebook execution, not any time in the queue. -
Errors that prevented a notebook from being executed are now reported in the notebook job response body in the
error
field. The field is an object with acode
field and amessage
field. Thecode
field is a string that can be used to identify the error. Currently the codes aretimeout
,jupyter_error
, andunknown
. Note that exceptions raised in the Jupyter notebook aren't considered errors, but are instead reported in theipynb_error
field.
- When logging into JupyterHub, a Noteburst now looks for XRSF tokens from each redirect.
- Adopt
ruff-shared.toml
from https://github.com/lsst/templates - Adopt uv for dependency management and resolution.
- Adopt explicit ASGITransport for setting up test HTTPX client.
-
Create Gafaelfawr service tokens instead of user tokens for authenticated calls to JupyterHub and JupyterLab. Gafaelfawr is standardizing on the new service token type for all service-to-service authentication.
-
Reduced the frequency of keep alive tasks for the Noteburst workers to once every 15 minutes, from once every 5 minutes. This is intended to clean up the logging output.
- Correctly extract cookies from the middle of the redirect chain caused by initial authentication to a Nublado lab. This fixes failures seen with labs containing JupyterHub 4.1.3.
- Add support for
gid
as well asuid
fields in the worker identity configuration. Bothuid
andgid
are now validated as integers
-
Add a
NOTEBURST_WORKER_MAX_CONCURRENT_JOBS
environment variable configuration to limit the number of concurrent jobs a worker can run. The default is 3. Previously this was 10. This should be set to be equal or less than the number of CPUs available to the JupyterLab pod. -
The notebook execution client now waits as long as possible for the
/execution
endpoint in the JupyterLab pod to return the executed notebook. Previously the client would wait for a fixed amount of time, which could be too short for long-running notebooks. The JupyterLab server may still time-out the request, though.
- Improved handling of the XSRF token when authenticated to JupyterHub and JupyterLab pods. This is required in JupyterLab 4.1.
- Fix Slack error messaging in the
nbexec
worker function. - Extract and use the actual XSRF token when communicating with the Hub and Lab.
-
Add formatted errors when a job is not found for the
GET /v1/notebooks/:job_id
endpoint. -
Errors and uncaught exceptions are now sent to Slack via a Slack webhook. The webhook URL is set via the
SLACK_WEBHOOK_URL
environment variable.
- The code base now uses Ruff for linting and formatting, replacing black, isort, and flake8. This change is part of the ongoing effort to standardize SQuaRE code bases and improve the developer experience.
-
The response to
GET /notebooks/:job_id
now includes anipynb_error
field that contains structured information about any exception that occurred when executing the notebook. As well, if an exception occurred, the resultant notebook is still included in the response. That is, notebook failures are no longer considered failed jobs. -
The
job_id
is now included in log messages when running thenbexec
job under arq. -
The user guide includes a new tutorial for using the Noteburst web API.
-
Update to Pydantic 2
-
Adopt FastAPI's lifespan feature
-
Adopt scriv for changelog management
-
Update GitHub Actions workflows, including integrating Neophile for dependency updates.
-
Update to Python 3.12.
- Add additional logging of JupyterLab spawning failures in workers.
- Added documentation for configuration environment variables.
- Added OpenAPI docs, rendered by Redoc, to the Sphinx documentation site.
- The JupyterHub service's URL path prefix is now configurable with the
NOTEBURST_JUPYTERHUB_PATH_PREFIX
environment variable. The default is/nb/
, which is the existing value. - The Nublado JupyterLab Controller service's URL path prefix is configurable with the
NOTEBURST_NUBLADO_CONTROLLER_PATH_PREFIX
environment variable. The default is/nublado
, which is the existing value.
- Fix how failed notebook executions are handled. Previously failed notebooks would prevent Noteburst from getting the results of the execution job. Now the job is shown as concluded but unsuccessful by the
/v1/notebooks/{job_id}
endpoint. - Structure uvicorn server logging.
- Stop following redirects from the
hub/login
endpoint. - Explicitly shut down the lab pod on worker shutdown.
- Additional updates for JupyterLab Controller image API endpoint.
- Migrated from the Cachemachine API to the new JupyterLab Controller API for obtaining the list of available Docker images for JupyterLab workers.
- Migrated to Python 3.11
- Adopted pyproject.toml for project metadata and dropped setup.cfg.
- Its now possible to skip retries on notebook execution failures in the
nbexec
task by passing anenable_retry=False
keyword argument. This is useful for applications that use Noteburst for continuous integration.
- The worker identity configuration can now omit the
uid
field for environments where Gafaelfawr is able to assign a UID (e.g. through an LDAP backend). - New configurations for workers:
- The new
NOTEBURST_WORKER_TOKEN_LIFETIME
environment variable enables you to configure the lifetime of the workers' authentication tokens. The default matches the existing behavior, 28 days. NOTEBURST_WORKER_TOKEN_SCOPES
environment variable enables you to set what token scopes the nublado2 bot users should have, as a comma-separated list.NOTEBURST_WORKER_IMAGE_SELECTOR
allows you to specify what stream of Nublado image to select. Can berecommended
,weekly
orreference
. If the latter, you can specify the specific Docker image withNOTEBURST_WORKER_IMAGE_REFERENCE
.- The
NOTEBURST_WORKER_KEEPALIVE
configuration controls whether the worker keep alive function is run (to defeat the Nublado pod culler), and at what frequency. Set todisabled
to disable;fast
to run every 30 seconds; ornormal
to run every 5 minutes.
- The new
- Noteburst now uses the arq client and dependency from Safir 3.2, which was originally developed from Noteburst.
Improved handling of the JupyterLab pod for noteburst workers:
-
If the JupyterLab pod goes away (such as if it is culled), the Noteburst workers shuts down so that Kubernetes creates a new worker with a new JupyterLab pod. A lost JupyterLab pod is detected by a 400-class response when submitting a notebook for execution.
-
If a worker starts up and a JupyterLab pod already exists for an unclaimed identity, the noteburst worker will continue to cycle through available worker identities until the JupyterLab start up is successful. This handles cases where a Noteburst worker restarts, but the JupyterLab pod did not shut down and thus is "orphaned."
-
Each JupyterLab worker runs a "keep alive" function that exercises the JupyterLab pod's Python kernel. This is meant to counter the "culler" that deletes dormant JupyterLab pods in the Rubin Science Platform. Currently the keep alive function runs every 30 seconds.
-
The default arq job execution timeout is now configurable with the
NOTEBURST_WORKER_JOB_TIMEOUT
environment variable. By default it is 300 seconds (5 minutes).
- Initial version of the
/v1/
HTTP API. - Migration to Safir 3 and its database framework.
- Noteburst is now cross-published to the GitHub Container Registry,
ghcr.io/lsst-sqre/noteburst
. - Migration to Python 3.10.
- Initial development version of Noteburst.