Skip to content

Commit

Permalink
Add SageMaker Studio support (#32)
Browse files Browse the repository at this point in the history
* Init studio support

Fix build

Fix assert syntax error

Update README

Init IAM prune

Reformat

Tighten permissions and bind with unique prefix

Update build

Update README

Add pyproject.toml

Cleanup

Pin black version in build and cleanup

Improve onboarding and various cleanups

Update README for SageMaker Studio

Update project prefix description

Find the custom resource stack deletion

Remove unnecessary return None

Fix datacapture uri and remove canary

Remove canary.js

Update canary deployment descriptions

* Address comments

Update docs

Remove synthetics windonw from dashboard

Tag pipeline with sagemaker project id and format

Fix retrain rule

Include necessary permissions to run worflow.ipynb

Update README

Setup pre-commit to lint and add default kernel

Fix trailing newline

Update README with one-click button

Cleanup

Update project tree structure in README

Remove dev artifacts from build

Update README
  • Loading branch information
ehsanmok authored May 27, 2021
1 parent 7b8654d commit 61fc76f
Show file tree
Hide file tree
Showing 34 changed files with 665 additions and 3,517 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.vscode
build
__pycache__
*.ipynb_checkpoints
16 changes: 16 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
default_language_version:
python: python3.7

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: trailing-whitespace
- repo: local
hooks:
- id: lint
name: lint
always_run: true
entry: scripts/lint.sh
language: system
types: [python]
1 change: 0 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,3 @@ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

94 changes: 48 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Amazon SageMaker Safe Deployment Pipeline

## Introduction

This is a sample solution to build a safe deployment pipeline for Amazon SageMaker. This example could be useful for any organization looking to operationalize machine learning with native AWS development tools such as AWS CodePipeline, AWS CodeBuild and AWS CodeDeploy.
Expand Down Expand Up @@ -32,64 +31,64 @@ In the following diagram, you can view the continuous delivery stages of AWS Cod

The following is the list of steps required to get up and running with this sample.

### Prepare an AWS Account
### Requirements

* Create your AWS account at [http://aws.amazon.com](http://aws.amazon.com) by following the instructions on the site.
* A Studio user account, see [onboard to Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-onboard.html)
### Enable Amazon SageMaker Studio Project

Create your AWS account at [http://aws.amazon.com](http://aws.amazon.com) by following the instructions on the site.
1. From AWS console navigate to Amazon SageMaker Studio and click on your studio user name (do **not** Open Studio now) and copy the name of execution role as shown below (similar to `AmazonSageMaker-ExecutionRole-20210112T085906`)

### *Optionally* fork this GitHub Repository and create an Access Token

1. [Fork](https://github.com/aws-samples/sagemaker-safe-deployment-pipeline/fork) a copy of this repository into your own GitHub account by clicking the **Fork** in the upper right-hand corner.
2. Follow the steps in the [GitHub documentation](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line) to create a new (OAuth 2) token with the following scopes (permissions): `admin:repo_hook` and `repo`. If you already have a token with these permissions, you can use that. You can find a list of all your personal access tokens in [https://github.com/settings/tokens](https://github.com/settings/tokens).
3. Copy the access token to your clipboard. For security reasons, after you navigate off the page, you will not be able to see the token again. If you have lost your token, you can [regenerate](https://docs.aws.amazon.com/codepipeline/latest/userguide/GitHub-authentication.html#GitHub-rotate-personal-token-CLI) your token.
<p align="center">
<img src="docs/studio-execution-role.png" alt="role" width="800" height="400"/>
</p>

### Launch the AWS CloudFormation Stack
2. Click on the launch button below to setup the stack

Click on the **Launch Stack** button below to launch the CloudFormation Stack to set up the SageMaker safe deployment pipeline.
<p align="center">
<a href="https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/quickcreate?templateUrl=https%3A%2F%2Famazon-sagemaker-safe-deployment-pipeline.s3.amazonaws.com%2Fstudio.yml&stackName=mlops-studio&param_PipelineBucket=amazon-sagemaker-safe-deployment-pipeline"><img src="https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png" width="250" height="50"></a>
</p>

[![Launch CFN stack](https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/quickcreate?templateUrl=https%3A%2F%2Famazon-sagemaker-safe-deployment-pipeline.s3.amazonaws.com%2Fsfn%2Fpipeline.yml&stackName=nyctaxi&param_GitHubBranch=master&param_GitHubRepo=amazon-sagemaker-safe-deployment-pipeline&param_GitHubUser=aws-samples&param_ModelName=nyctaxi&param_NotebookInstanceType=ml.t3.medium)
and paste the role name copied in step 1 as the value of the parameter `SageMakerStudioRoleName` as shown below and click **Create Stack**

Provide a stack name eg **sagemaker-safe-deployment-pipeline** and specify the parameters.
<p align="center">
<img src="docs/studio-cft.png" alt="role" width="400" height="600"/>
</p>

*Alternatively*, one can use the provided `scripts/build.sh` (which required [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed with appropriate IAM permissions) as follows
```
# bash scripts/build.sh S3_BUCKET_NAME STACK_NAME REGION STUDIO_ROLE_NAME
# REGION should match your default AWS CLI region
# STUDIO_ROLE_NAME is copied from step 1. Example:
bash scripts/build.sh example-studio example-pipeline us-east-1 AmazonSageMaker-ExecutionRole-20210112T085906
```

Parameters | Description
----------- | -----------
Model Name | A unique name for this model (must be less than 15 characters long).
S3 Bucket for Dataset | The bucket containing the dataset (defaults to [nyc-tlc](https://registry.opendata.aws/nyc-tlc-trip-records-pds/))
Notebook Instance Type | The [Amazon SageMaker instance type](https://aws.amazon.com/sagemaker/pricing/instance-types/). Default is ml.t3.medium.
GitHub Repository | The name (not URL) of the GitHub repository to pull from.
GitHub Branch | The name (not URL) of the GitHub repository’s branch to use.
GitHub Username | GitHub Username for this repository. Update this if you have forked the repository.
GitHub Access Token | The optional Secret OAuthToken with access to your GitHub repository.
Email Address | The optional Email address to notify on successful or failed deployments.
3. From the AWS console navigate to `cloudformation` and once the stack `STACK_NAME` is ready
4. Go to your SageMaker Studio and **Open Studio** (and possibly refresh your browser if you're already in Studio) and from the your left hand side panel, click on the inverted triangle. As with the screenshot below, under `Projects -> Create project -> Organization templates`, you should be able to see the added **SageMaker Safe Deployment Pipeline**. Click on the template name and **Select project template**

![code-pipeline](docs/stack-parameters.png)
<p align="center">
<img src="docs/studio-sagemaker-project-template.png" alt="role" width="800" height="400"/>
</p>

You can launch the same stack using the AWS CLI. Here's an example:
5. Choose a name for the project and can leave the rest of the fields with their default values (can use your own email for SNS notifications) and click on **Create project**
6. Once the project is created, it gives you the option to clone it locally from AWS CodeCommit by a single click. Click clone and it goes directly to the project
7. Navigate to the code base and go to `notebook/mlops.ipynb`
8. Choose a kernel from the prompt such as `Python 3 (Data Science)`
9. Assign your project name to the placeholder `PROJECT_NAME` in the first code cell of the `mlops.ipynb` notebook
10. Now you are ready to go through the rest of the cells in `notebook/mlops.ipynb`

`
aws cloudformation create-stack --stack-name sagemaker-safe-deployment \
--template-body file://pipeline.yml \
--capabilities CAPABILITY_IAM \
--parameters \
ParameterKey=ModelName,ParameterValue=mymodelname \
ParameterKey=GitHubUser,ParameterValue=[email protected] \
ParameterKey=GitHubToken,ParameterValue=YOURGITHUBTOKEN12345ab1234234
`

### Start, Test and Approve the Deployment

Once the deployment is complete, there will be a new AWS CodePipeline created, with a Source stage that is linked to your source code repository. You will notice initially that it will be in a *Failed* state as it is waiting on an S3 data source.

![code-pipeline](docs/data-source-before.png)

Launch the newly created SageMaker Notebook in your [AWS console](https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/), navigate to the `notebook` directory and opening the notebook by clicking on the `mlops.ipynb` link.

![code-pipeline](docs/sagemaker-notebook.png)

Once the notebook is running, you will be guided through a series of steps starting with downloading the [New York City Taxi](https://registry.opendata.aws/nyc-tlc-trip-records-pds/) dataset, uploading this to an Amazon SageMaker S3 bucket along with the data source meta data to trigger a new build in the AWS CodePipeline.

![code-pipeline](docs/datasource-after.png)

Once your pipeline is kicked off it will run model training and deploy a development SageMaker Endpoint.
Once your pipeline is kicked off it will run model training and deploy a development SageMaker Endpoint.

There is a manual approval step which you can action directly within the SageMaker Notebook to promote this to production, send some traffic to the live endpoint and create a REST API.

Expand Down Expand Up @@ -138,15 +137,20 @@ This project is written in Python, and design to be customized for your own mode
│   ├── buildspec.yml
│   ├── dashboard.json
│   ├── requirements.txt
│   └── run.py
│   └── run_pipeline.py
├── notebook
│   ├── canary.js
│   ├── dashboard.json
| ├── workflow.ipynb
│   └── mlops.ipynb
└── pipeline.yml
├── scripts
| ├── build.sh
| ├── lint.sh
| └── set_kernelspec.py
├── pipeline.yml
└── studio.yml
```

Edit the `get_training_params` method in the `model/run.py` script that is run as part of the AWS CodeBuild step to add your own estimator or model definition.
Edit the `get_training_params` method in the `model/run_pipeline.py` script that is run as part of the AWS CodeBuild step to add your own estimator or model definition.

Extend the AWS Lambda hooks in `api/pre_traffic_hook.py` and `api/post_traffic_hook.py` to add your own validation or inference against the deployed Amazon SageMaker endpoints. You can also edit the `api/app.py` lambda to add any enrichment or transformation to the request/response payload.

Expand All @@ -158,8 +162,7 @@ This section outlines cost considerations for running the SageMaker Safe Deploym
- **CodeCommit** – $1/month if you didn't opt to use your own GitHub repository.
- **CodeDeploy** – No cost with AWS Lambda.
- **CodePipeline** – CodePipeline costs $1 per active pipeline* per month. Pipelines are free for the first 30 days after creation. More can be found at [AWS CodePipeline Pricing](https://aws.amazon.com/codepipeline/pricing/).
- **CloudWatch** - This template includes a [Canary](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html), 1 dashboard and 4 alarms (2 for deployment, 1 for model drift and 1 for canary) which costs less than $10 per month.
- Canaries cost $0.0012 per run, or $5/month if they run every 10 minutes.
- **CloudWatch** - This template includes 1 dashboard and 3 alarms (2 for deployment and 1 for model drift) which costs less than $10 per month.
- Dashboards cost $3/month.
- Alarm metrics cost $0.10 per alarm.
- **CloudTrail** - Low cost, $0.10 per 100,000 data events to enable [S3 CloudWatch Event](https://docs.aws.amazon.com/codepipeline/latest/userguide/create-cloudtrail-S3-source-console.html). For more information, see [AWS CloudTrail Pricing](https://aws.amazon.com/cloudtrail/pricing/)
Expand All @@ -170,7 +173,7 @@ This section outlines cost considerations for running the SageMaker Safe Deploym
- The `ml.t3.medium` instance *notebook* costs $0.0582 an hour.
- The `ml.m4.xlarge` instance for the *training* job costs $0.28 an hour.
- The `ml.m5.xlarge` instance for the *monitoring* baseline costs $0.269 an hour.
- The `ml.t2.medium` instance for the dev *hosting* endpoint costs $0.065 an hour.
- The `ml.t2.medium` instance for the dev *hosting* endpoint costs $0.065 an hour.
- The two `ml.m5.large` instances for production *hosting* endpoint costs 2 x $0.134 per hour.
- The `ml.m5.xlarge` instance for the hourly scheduled *monitoring* job costs $0.269 an hour.
- **S3** – Prices will vary depending on the size of the model/artifacts stored. The first 50 TB each month will cost only $0.023 per GB stored. For more information, see [Amazon S3 Pricing](https://aws.amazon.com/s3/pricing/).
Expand All @@ -193,4 +196,3 @@ See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more inform
## License

This library is licensed under the MIT-0 License. See the LICENSE file.

4 changes: 1 addition & 3 deletions api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,6 @@ def lambda_handler(event, context):
"body": predictions,
}
except ClientError as e:
logger.error(
"Unexpected sagemaker error: {}".format(e.response["Error"]["Message"])
)
logger.error("Unexpected sagemaker error: {}".format(e.response["Error"]["Message"]))
logger.error(e)
return {"statusCode": 500, "message": "Unexpected sagemaker error"}
13 changes: 3 additions & 10 deletions api/pre_traffic_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,9 @@ def lambda_handler(event, context):
else:
# Validate that endpoint config has data capture enabled
endpoint_config_name = response["EndpointConfigName"]
response = sm.describe_endpoint_config(
EndpointConfigName=endpoint_config_name
)
if (
"DataCaptureConfig" in response
and response["DataCaptureConfig"]["EnableCapture"]
):
logger.info(
"data capture enabled for endpoint config %s", endpoint_config_name
)
response = sm.describe_endpoint_config(EndpointConfigName=endpoint_config_name)
if "DataCaptureConfig" in response and response["DataCaptureConfig"]["EnableCapture"]:
logger.info("data capture enabled for endpoint config %s", endpoint_config_name)
else:
error_message = "SageMaker data capture not enabled for endpoint config"
# TODO: Invoke endpoint if don't have canary / live traffic
Expand Down
8 changes: 4 additions & 4 deletions assets/deploy-model-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ Resources:
Model:
Type: "AWS::SageMaker::Model"
Properties:
ModelName: !Sub mlops-${ModelName}-dev-${TrainJobId}
ModelName: !Sub ${ModelName}-dev-${TrainJobId}
PrimaryContainer:
Image: !Ref ImageRepoUri
ModelDataUrl: !Sub s3://sagemaker-${AWS::Region}-${AWS::AccountId}/${ModelName}/mlops-${ModelName}-${TrainJobId}/output/model.tar.gz
ModelDataUrl: !Sub s3://sagemaker-${AWS::Region}-${AWS::AccountId}/${ModelName}/${ModelName}-${TrainJobId}/output/model.tar.gz
ExecutionRoleArn: !Ref DeployRoleArn

EndpointConfig:
Expand All @@ -38,11 +38,11 @@ Resources:
InstanceType: ml.t2.medium
ModelName: !GetAtt Model.ModelName
VariantName: !Sub ${ModelVariant}-${ModelName}
EndpointConfigName: !Sub mlops-${ModelName}-dec-${TrainJobId}
EndpointConfigName: !Sub ${ModelName}-dec-${TrainJobId}
KmsKeyId: !Ref KmsKeyId

Endpoint:
Type: "AWS::SageMaker::Endpoint"
Properties:
EndpointName: !Sub mlops-${ModelName}-dev-${TrainJobId}
EndpointName: !Sub ${ModelName}-dev-${TrainJobId}
EndpointConfigName: !GetAtt EndpointConfig.EndpointConfigName
Loading

0 comments on commit 61fc76f

Please sign in to comment.