The following prerequisites must be completed to replicate this project.
From the SageMaker Studio UI, create a SageMaker Project using one of the 1st party supported templates. Please use one of the following templates -
- MLOps template for model building, training, and deployment
- MLOps template for model building, training, and deployment with third-party Git repositories using Jenkins
- MLOps template for model building, training, and deployment with third-party Git repositories using CodePipeline
*Note - It is recommended to use the first template which leverages SageMaker native services for Source Version Control and CI/CD. This reduces the number of steps needed for an end to end demo. If choosing the other templates, please follow the documentation to complete the template specific prerequisites.
For more information on SageMaker Projects, visit the AWS documentation.
Once the project is created, you will see 2 repositories created - one for the model building code and one for the model deployment code. The model deployment code will remain untouched, the model building code will be changed by replacing pipelines/abalone/
with the code from this repo under sagemaker-pipeline/
- Upload
datawrangler/abalone-dataset-header.csv
to S3 and note the S3 URI - Replace the S3 URI in line 19 of
datawrangler/fs-ingest-wrangler-training-template.flow
with the S3 URI you uploaded the dataset to in the step above - Uplaod this
.flow
file to SageMaker Studio and open it. This will open up the DataWrangler UI. - On the DW UI, click on
Export Step
and selectFeature Store
. This will generate a notebook. - Run the code in the notebook generated to ingest features in to Feature Store.
This repository contains 2 folders -
sagemaker-pipeline/
This folder contains a code to create and run a SageMaker Pipeline to process data, run a hyperparameter tuning job, evaluate the top model from the HPO job, use Clarify to generate a bias report, and register the model into a model registry.model-monitor
This folder contains notebooks for creating an endpoint from a model registered in the model registry and setting up a Data Quality Monitor.
- If using Feature Store, the first step of the Pipeline will need to read data from Feature Store.
- In the file
sagemaker-pipeline/pipeline-dw-fs.py
lines 131 to 178 need to be replaced with the code in the notebook created by the DataWrangler Export. - The first step of the Pipeline will be
step_read_train
- Replace the first step in
sagemaker-pipeline/pipeline.py
withstep_read_train
andstep_process
fromsagemaker-pipeline/pipeline-dw-fs.py
.
IF NOT USING FEATURE STORE, IGNORE THE STEPS ABOVE AND FOLLOW THE BELOW STEPS.
- Navigate to the model build repo created by the SageMaker Project, replace
the code in
pipelines/abalone/
with the code insagemaker-pipeline/
. - Trigger the pipeline by pushing the new code to the CodeCommit/Git repo (depending on the template selected)
- Once the pipeline has completed, find the model package group in the Model Registry and find the ARN of the model package created in the group
- Approve the model in the model registry, this will trigger the model deployment pipeline, you should see an endpoint being created in SageMaker
- This endpoint will have the suffix
-staging
. You can navigate to CodePipeline, and under Pipelines you will see one with your project name andmodel-deploy
. Click on that Pipeline and you will see a manual approval option. When approved, a new endpoint will be created with the suffix-prod
. - These endpoints are created by the default seed code in the 1st party template and do not have Data Capture enabled.
To setup Model Monitor
- Navigate to
model-monitor/create_endpoint.ipynb
to create an endpoint with DataCapture enabled - Run
model-monitor/data_quality_monitor.ipynb
to set up a Data Quality Monitoring schedule on the endpoint.
Once all the Endpoints have been created, navigate to the Endpoint UI in SageMaker Studio. Click on the endpoint deployed using the notebook in the model monitor folder,
Things to highlight in the demo -
- End to end lineage
- View of the trial component from the model in the Model Registry
- Lineage from the Endpoint to the Model Package Group and Version
- Pipelines integration with Experiments
- Debugging a Pipeline through the DAG view
- CI/CD for automatic training and deployment