Jupyter Notebook Pipelines on Kubernetes using Bodywork

This repository contains a Bodywork project that demonstrates how to run a ML pipeline on Kubernetes, using Jupyter notebooks with Bodywork. The example pipeline has two stages, defined in two notebooks:

train_model.ipynb - download data from an AWS S3 bucket, train a classifier and then uploaded it back to the same S3 bucket.
score_data.ipynb - download the trained model from AWS S3, together with data needs to be scored, and then score the data and upload the results back to S3.

To run this project, follow the steps below.

Get Access to a Kubernetes Cluster

Use our Quickstart Guide to Kubernetes for MLOps to spin-up a local Minikube cluster in minutes.

Install the Bodywork Python Package

$ pip install bodywork

Run the ML Pipeline

$ bodywork create deployment https://github.com/bodywork-ml/bodywork-jupyter-pipeline-project

The orchestrator logs will be streamed to your terminal until the job has been successfully completed.

Running the ML Pipeline on a Schedule

If you're happy with the test results, you can schedule the workflow-controller to operate remotely on the cluster on a pre-defined schedule. For example, to setup the the workflow to run every hour, use the following command,

$ bodywork create cronjob https://github.com/bodywork-ml/bodywork-jupyter-pipeline-project \
    --name=jupyter-pipeline \
    --schedule="0 * * * *"

Each scheduled workflow will attempt to re-run the batch-job, as defined by the state of this repository's master branch at the time of execution.

To get the execution history for all jupyter-pipeline jobs use,

$ bodywork get cronjob jupyter-pipeline --history

Make this Project Your Own

This repository is a GitHub template repository that can be automatically copied into your own GitHub account by clicking the Use this template button above.

After you've cloned the template project, use official Bodywork documentation to help modify the project to meet your own requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
README.md		README.md
VERSION		VERSION
bodywork.yaml		bodywork.yaml
requirements.txt		requirements.txt
score_data.ipynb		score_data.ipynb
train_model.ipynb		train_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jupyter Notebook Pipelines on Kubernetes using Bodywork

Get Access to a Kubernetes Cluster

Install the Bodywork Python Package

Run the ML Pipeline

Running the ML Pipeline on a Schedule

Make this Project Your Own

About

Releases

Packages

Languages

bodywork-ml/bodywork-jupyter-pipeline-project

Folders and files

Latest commit

History

Repository files navigation

Jupyter Notebook Pipelines on Kubernetes using Bodywork

Get Access to a Kubernetes Cluster

Install the Bodywork Python Package

Run the ML Pipeline

Running the ML Pipeline on a Schedule

Make this Project Your Own

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages