This repository accompanies a hands-on training event to introduce data scientists (and ML-ready developers / technical leaders) to core model training and deployment workflows with Amazon SageMaker.
Like a "101" course in the academic sense, this will likely not be the simplest introduction to SageMaker you can find; nor the fastest way to get started with advanced features like optimized SageMaker Distributed training or SageMaker Clarify for bias and explainability analyses.
Instead, these exercises are chosen to demonstrate some core build/train/deploy patterns that we've found help new users to first get productive with SageMaker - and to later understand how the more advanced features fit in.
Sessions in suggested order:
- builtin_algorithm_hpo_tabular: Demonstrating how to use (and tune the hyperparameters of) a pre-built, SageMaker-provided algorithm (Applying XGBoost to tabular data)
- (Optional) custom_sklearn_rf: Introductory example showing how to bring your own algorithm, using SageMaker's Scikit-Learn container environment as a base (Predicting housing prices)
- custom_tensorflow_keras_nlp: Demonstrating how to bring your own algorithm, using SageMaker's TensorFlow container environment as a base (Classifying news headline text)
- migration_challenge_keras_image: A challenge to use what you've learned to migrate an existing TensorFlow notebook to SageMaker model training job and real-time inference endpoint deployment (Classifying MNIST DIGITS images)
While the deep learning exercises above are presented in TensorFlow+Keras by default, PyTorch users can explore the pytorch_alternatives folder instead.
If you've onboarded to SageMaker Studio, you can download this repository by launching a System terminal (From the "Utilities and files" section of the launcher screen) and running git clone https://github.com/aws-samples/sagemaker-101-workshop
.
If you prefer to use classic SageMaker Notebook Instances, you can find a CloudFormation template defining the standard setup at .ee.tpl.yaml. This can be deployed via the AWS CloudFormation Console.
Note: Some of the examples depend on ipywidgets for interactive inference demos, which should be installed by default in SageMaker Studio but requires additional setup on a classic Notebook Instance. See the CloudFormation template for an example installing the required libraries via Lifecycle Configuration script.
Depending on your setup, you may be asked to choose a kernel when opening some notebooks. There should be guidance at the top of each notebook on suggested kernel types, but if you can't find any, Python 3 (Data Science)
(on Studio) or conda_python3
(on Notebook Instances) are likely good options.
You can refer to the "How Are Amazon SageMaker Studio Notebooks Different from Notebook Instances?" docs page for more details on differences between the Studio and Notebook Instance environments. As that page notes, SageMaker studio does not yet support local mode: which we find can be useful to accelerate debugging in the migration challenge.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
One major focus of this workshop is how SageMaker helps us right-size and segregate compute resources for different ML tasks, without sacrificing (but ideally accelerating!) data scientist productivity. For more information on this topic, see this post on the AWS Machine Learning Blog: Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker
As you continue to explore Amazon SageMaker, you'll also find many more useful resources in:
- The official Amazon SageMaker Examples repository: with a broad range of code samples covering SageMaker use cases from beginner to expert.
- The documentation (and maybe even the source code) for the SageMaker Python SDK: The high-level, open-source PyPI library we use when we
import sagemaker
. - The Amazon SageMaker Developer Guide: documenting the SageMaker service itself.
More advanced users may also find it helpful to refer to:
- The boto3 reference for SageMaker and the SageMaker API reference: in case you have use cases for SageMaker where you want (or need) to use low-level APIs directly, instead of through the
sagemaker
library. - The AWS Deep Learning Containers and SageMaker Scikit-Learn Containers source code: For a deeper understanding of the framework container environments.