You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we use tensorflow/tensorflow:latest or (tensorflow/tensorflow:2.7.0 and tensorflow/tensorflow:2.3.0) to run our notebooks inside a Kubeflow Pipeline.
However, that image is very large and laden with many dependencies, many of which are not required for the respective notebooks. Due to those many dependencies, apk packages, binaries, and Python packages, the Docker image frequently fails to install the required notebook dependencies on top of it. At the moment only 4 of our 8 sample notebooks can run.
We need to find a basic Python Docker image and install only the necessary requirements like papermill.
To show (some of) the steps required to run a notebook on Kubernetes, take a look at this script from the katalog repo runs notebooks outside of a cluster:
# TODO: find a smaller Docker image
IMAGE="tensorflow/tensorflow:latest"
docker run -i --rm --entrypoint """${IMAGE}" bash -c " # download the notebook wget -q -O notebook_in.ipynb '${NOTEBOOK_URL}' 2> /dev/null || curl -s -o notebook_in.ipynb '${NOTEBOOK_URL}' # update pip python3 -m pip install pip --upgrade --quiet --progress-bar=ascii # install Elyra requirements, may not all be required beyond "papermill" python3 -m pip install -r https://raw.githubusercontent.com/elyra-ai/elyra/master/etc/generic/requirements-elyra.txt --quiet --progress-bar on # if the notebook has requirements, install those [[ -n '${REQUIREMENTS}' ]] && python3 -m pip install ${REQUIREMENTS} --quiet --progress-bar=on # show the installed package python3 -m pip list # run the notebook with papermill papermill --log-level CRITICAL --report-mode notebook_in.ipynb notebook_out.ipynb">>"${LOG_FILE}"2>&1&&echo OK ||echo FAILED
Some Considerations:
If we use a generic Docker image like python:3.9 then the pip install steps for the elyra-ai requirements have to be repeated every time a notebook is run
If we create a custom notebook image, or maybe several most of the pip install steps are done at the time the Docker image is built, speeding up actual notebook execution
Although the Docker image will be bigger this way, once it has been pulled onto the cluster, it should get cached.
The same is not true for previously downloaded Python packages inside the container running the notebook.
And generally the increased time for downloading a bigger Docker image is a fraction of the increased time required to download pip packages and the time pip needs on top of that to resolve potential version conflicts.
We could use several specialized images for notebooks that have similar dependencies:
ART+AIF360
CodeNet
Quantum/Qiskit
Additional Information:
Also see this notebook runner component with sample pipeline in KFP:
Currently we use
tensorflow/tensorflow:latest
or (tensorflow/tensorflow:2.7.0
andtensorflow/tensorflow:2.3.0
) to run our notebooks inside a Kubeflow Pipeline.However, that image is very large and laden with many dependencies, many of which are not required for the respective notebooks. Due to those many dependencies,
apk
packages, binaries, and Python packages, the Docker image frequently fails to install the required notebook dependencies on top of it. At the moment only 4 of our 8 sample notebooks can run.We need to find a basic Python Docker image and install only the necessary requirements like
papermill
.To show (some of) the steps required to run a notebook on Kubernetes, take a look at this script from the
katalog
repo runs notebooks outside of a cluster:https://github.com/machine-learning-exchange/katalog/blob/7fcd5ce/tools/bash/run_notebooks.sh#L58-L65
Some Considerations:
python:3.9
then the pip install steps for theelyra-ai
requirements have to be repeated every time a notebook is runAdditional Information:
Also see this notebook runner component with sample pipeline in KFP:
The text was updated successfully, but these errors were encountered: