Skip to content

jhammarstedt/MLOps-Kubeflow_in_GCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLOPS: CI&CD with Kubeflow Pipelines in GCP

This repo will demonstrate how to take the first step towards MLOps by setting up and deploying a simple ML CI/CD pipeline using Google Clouds AI Platform, Kubeflow and Docker.

✍ Authors

🗺 Overview

The following topics will be covered:

  1. Building each task as a docker container and running them with cloud build
    • Preprocessing step: Loading data from GC bucket, editing it and storing a new file
    • Training: Creating a pytorch model and build a custom prediction routine (GCP mainly supporst tensorflow, but you can add custom models)
    • Deployment: Deploying your custom model to the AI Platform with version control
  2. Creating a Kubeflow pipeline and connecting the above tasks
  3. Perform CI by building Github Triggers in Cloud Build that will rebuild container upon a push to repository
  4. CD by using Cloud Functions to trigger upon uploading new data to your bucket

gcloud_meme

📽 Video Demo

There's a short video demo of the project available here.

Note that it was created for a DevOps course at KTH with a 3 minute limit and is therefore very breif and compressed to fit these requirements.

🌉 Setting up the pipeline

Here we will go through the process of running the pipeline step by step: (Note at the moment there are some hard coded project names/repos etc that you might want to change, this will be updated here eventually)

  1. Create a gcp project, open the shell (make sure you're in the project), and clone the repository

    $ git clone https://github.com/jhammarstedt/gcloud_MLOPS_demo.git

  2. Create a kubeflow pipeline

  3. Run the $ ./scripts/set_auth.sh script in google cloud shell (might wanna change the SA_NAME), this gives us the roles we need to run the pipeline

  4. Create a project bucket and a data bucket (used for CD later), here we named just it {PROJECT_NAME}_bucket and {PROJECT_NAME}-data-bucket

  • In the general project bucket add following subfolders: models, packages,data
  1. Locally, create a package from the models directory in the containers/train folder by running: $ python containers/train/models/setup.py sdist , this creates a package with pytorch and the model structure, just drag and drop it to the package subfolder.

  2. Create a docker container for each step (each of the folders in the containers repo representes a different step) * Do this by running: $ gcloud_MLOPS_demo/containers ./build_containers.sh from the cloud shell.

    This will run "build_single_container.sh in each directory"

    • If you wish to try and just build one container, enter the directory which you want to build and run:

      $ bash ../build_single_container.sh {directory name}

  3. Each subfolder (which will be a container) includes:

    • A cloudbuild.yaml file (created in build_single_repo.sh) which will let Cloud Build create a docker container by running the included Dockerfile.

    • The DockerFile that mainly runs the task script (e.g deploy.sh)

    • A task script that tells the Docker container what to do (e.g preproc/train/deploy the trained model to the AI-platform)

  4. To test the container manually run

    $ docker run -t gcr.io/{YOUR_PROJECT}/{IMAGE}:latest --project {YOUR_PROJECT} --bucket {YOUR_BUCKET} local

    e.g to run the container that deploys the model to AI platform I would run:

    $ docker run -t gcr.io/ml-pipeline-309409/ml-demo-deploy-toai

  5. Create a pipeline in python using the kubeflow API (currently a notebook in AI platform)

  6. Now we can either run the pipeline manually at the pipeline dashbord from 1. or run it as a script.

🛠 CI

To set up CI and rebuild at every push:

  • Connect gcloud to github, either in the Trigger UI or run: $ ./scripts setup_trigger.sh
  • Push the newly created cloudbuilds from GCP into the origin otherwise the trigger won't find them
  • This trigger will run everytime a push to master happens in any of the containers and thus rebuild the affected Docker Image

📦 CD

CD can be necessary when we want to retrain/finetune the model give that we get new data, not every time we update a component. So we will have a Cloud function that will trigger a training pipeline when we upload new data to the Cloud Storage.

  1. Get the pipeline host url from pipeline settings (looks like this:, ideally save it as an PIPELINE_HOST environment variable).

  2. in pipeline folder, run the deploy script

    $ ./deploy_cloudfunction $PIPELINE_HOST

  3. Now, whenever a new file is added or deleted from the project bucket, it will rerun the pipeline.

👓 Resources used and further reading